Core Classes
pytorch_tabular.TabularModel(config=None, data_config=None, model_config=None, optimizer_config=None, trainer_config=None, experiment_config=None, model_callable=None, model_state_dict_path=None)
The core model which orchestrates everything from initializing the datamodule, the model, trainer, etc.
PARAMETER | DESCRIPTION |
---|---|
config |
Single OmegaConf DictConfig object or the path to the yaml file holding all the config parameters. Defaults to None.
TYPE:
|
data_config |
DataConfig object or path to the yaml file. Defaults to None.
TYPE:
|
model_config |
A subclass of ModelConfig or path to the yaml file. Determines which model to run from the type of config. Defaults to None.
TYPE:
|
optimizer_config |
OptimizerConfig object or path to the yaml file. Defaults to None.
TYPE:
|
trainer_config |
TrainerConfig object or path to the yaml file. Defaults to None.
TYPE:
|
experiment_config |
ExperimentConfig object or path to the yaml file. If Provided configures the experiment tracking. Defaults to None.
TYPE:
|
model_callable |
If provided, will override the model callable that will be loaded from the config. Typically used when providing Custom Models
TYPE:
|
model_state_dict_path |
If provided, will load the state dict after initializing the model from config.
TYPE:
|
Source code in src/pytorch_tabular/tabular_model.py
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 |
|
create_finetune_model(task, head, head_config, target=None, optimizer_config=None, trainer_config=None, experiment_config=None, loss=None, metrics=None, metrics_prob_input=None, metrics_params=None, optimizer=None, optimizer_params={}, learning_rate=None, target_range=None)
Creates a new TabularModel model using the pretrained weights and the new task and head
PARAMETER | DESCRIPTION |
---|---|
task |
The task to be performed. One of "regression", "classification"
TYPE:
|
head |
The head to be used for the model. Should be one of the heads defined
in
TYPE:
|
head_config |
The config as a dict which defines the head. If left empty, will be initialized as default linear head.
TYPE:
|
target |
The target column name if not provided in the initial pretraining stage. Defaults to None.
TYPE:
|
optimizer_config |
If provided, will redefine the optimizer for fine-tuning stage. Defaults to None.
TYPE:
|
trainer_config |
If provided, will redefine the trainer for fine-tuning stage. Defaults to None.
TYPE:
|
experiment_config |
If provided, will redefine the experiment for fine-tuning stage. Defaults to None.
TYPE:
|
loss |
If provided, will be used as the loss function for the fine-tuning. By Default it is MSELoss for regression and CrossEntropyLoss for classification.
TYPE:
|
metrics |
List of metrics (either callables or str) to be used for the
fine-tuning stage. If str, it should be one of the functional metrics implemented in
TYPE:
|
metrics_prob_input |
Is a mandatory parameter for classification metrics This defines whether the input to the metric function is the probability or the class. Length should be same as the number of metrics. Defaults to None.
TYPE:
|
metrics_params |
The parameters for the metrics in the same order as metrics.
For eg. f1_score for multi-class needs a parameter
TYPE:
|
optimizer |
Custom optimizers which are a drop in replacements for standard PyTorch optimizers. If provided, the OptimizerConfig is ignored in favor of this. Defaults to None.
TYPE:
|
optimizer_params |
The parameters for the optimizer. Defaults to {}.
TYPE:
|
learning_rate |
The learning rate to be used. Defaults to 1e-3.
TYPE:
|
target_range |
The target range for the regression task. Is ignored for classification. Defaults to None.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
TabularModel
|
The new TabularModel model for fine-tuning
TYPE:
|
Source code in src/pytorch_tabular/tabular_model.py
766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875 876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920 921 922 923 924 925 926 927 928 |
|
evaluate(test=None, test_loader=None, ckpt_path=None, verbose=True)
Evaluates the dataframe using the loss and metrics already set in config.
PARAMETER | DESCRIPTION |
---|---|
test |
The dataframe to be evaluated. If not provided, will try to use the test provided during fit. If that was also not provided will return an empty dictionary
TYPE:
|
test_loader |
The dataloader to be used for evaluation. If provided, will use the dataloader instead of the test dataframe or the test data provided during fit. DEPRECATION: providing test data during fit is deprecated and will be removed in a future release. Defaults to None.
TYPE:
|
ckpt_path |
The path to the checkpoint to be loaded. If not provided, will try to use the best checkpoint during training.
TYPE:
|
verbose |
If true, will print the results. Defaults to True.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Union[dict, list]
|
The final test result dictionary. |
Source code in src/pytorch_tabular/tabular_model.py
feature_importance()
find_learning_rate(model, datamodule, min_lr=1e-08, max_lr=1, num_training=100, mode='exponential', early_stop_threshold=4.0, plot=True, callbacks=None)
Enables the user to do a range test of good initial learning rates, to reduce the amount of guesswork in picking a good starting learning rate.
PARAMETER | DESCRIPTION |
---|---|
model |
The PyTorch Lightning model to be trained.
TYPE:
|
datamodule |
The datamodule
TYPE:
|
min_lr |
minimum learning rate to investigate
TYPE:
|
max_lr |
maximum learning rate to investigate
TYPE:
|
num_training |
number of learning rates to test
TYPE:
|
mode |
search strategy, either 'linear' or 'exponential'. If set to 'linear' the learning rate will be searched by linearly increasing after each batch. If set to 'exponential', will increase learning rate exponentially.
TYPE:
|
early_stop_threshold |
threshold for stopping the search. If the loss at any point is larger than early_stop_threshold*best_loss then the search is stopped. To disable, set to None.
TYPE:
|
plot |
If true, will plot using matplotlib
TYPE:
|
callbacks |
If provided, will be added to the callbacks for Trainer.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Tuple[float, pd.DataFrame]
|
The suggested learning rate and the learning rate finder results |
Source code in src/pytorch_tabular/tabular_model.py
1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039 1040 1041 1042 1043 1044 1045 1046 1047 1048 1049 1050 1051 1052 1053 1054 1055 1056 1057 1058 1059 1060 1061 1062 1063 1064 1065 1066 1067 1068 1069 1070 1071 1072 1073 1074 1075 1076 1077 1078 1079 |
|
finetune(train, validation=None, train_sampler=None, target_transform=None, max_epochs=None, min_epochs=None, seed=42, callbacks=None, datamodule=None, freeze_backbone=False)
Finetunes the model on the provided data
PARAMETER | DESCRIPTION |
---|---|
train |
The training data with labels
TYPE:
|
validation |
The validation data with labels. Defaults to None.
TYPE:
|
train_sampler |
If provided, will be used as a batch sampler for training. Defaults to None.
TYPE:
|
target_transform |
If provided, will be used to transform the target before training and inverse transform the predictions.
TYPE:
|
max_epochs |
The maximum number of epochs to train for. Defaults to None.
TYPE:
|
min_epochs |
The minimum number of epochs to train for. Defaults to None.
TYPE:
|
seed |
The seed to be used for training. Defaults to 42.
TYPE:
|
callbacks |
If provided, will be added to the callbacks for Trainer. Defaults to None.
TYPE:
|
datamodule |
If provided, will be used as the datamodule for training. Defaults to None.
TYPE:
|
freeze_backbone |
If True, will freeze the backbone by tirning off gradients. Defaults to False, which means the pretrained weights are also further tuned during fine-tuning.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
pl.Trainer
|
pl.Trainer: The trainer object |
Source code in src/pytorch_tabular/tabular_model.py
930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965 966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 |
|
fit(train, validation=None, test=None, loss=None, metrics=None, metrics_prob_inputs=None, optimizer=None, optimizer_params={}, train_sampler=None, target_transform=None, max_epochs=None, min_epochs=None, seed=42, callbacks=None, datamodule=None)
The fit method which takes in the data and triggers the training.
PARAMETER | DESCRIPTION |
---|---|
train |
Training Dataframe
TYPE:
|
validation |
If provided, will use this dataframe as the validation while training. Used in Early Stopping and Logging. If left empty, will use 20% of Train data as validation. Defaults to None.
TYPE:
|
test |
If provided, will use as the hold-out data, which you'll be able to check performance after the model is trained. Defaults to None. DEPRECATED. Will be removed in the next version.
TYPE:
|
loss |
Custom Loss functions which are not in standard pytorch library
TYPE:
|
metrics |
Custom metric functions(Callable) which has the signature metric_fn(y_hat, y) and works on torch tensor inputs. y_hat is expected to be of shape (batch_size, num_classes) for classification and (batch_size, 1) for regression and y is expected to be of shape (batch_size, 1)
TYPE:
|
metrics_prob_inputs |
This is a mandatory parameter for classification metrics. If the metric function requires probabilities as inputs, set this to True. The length of the list should be equal to the number of metrics. Defaults to None.
TYPE:
|
optimizer |
Custom optimizers which are a drop in replacements for standard PyToch optimizers. This should be the Class and not the initialized object
TYPE:
|
optimizer_params |
The parmeters to initialize the custom optimizer.
TYPE:
|
train_sampler |
Custom PyTorch batch samplers which will be passed to the DataLoaders. Useful for dealing with imbalanced data and other custom batching strategies
TYPE:
|
target_transform |
If provided, applies the transform to the target before modelling and inverse the transform during prediction. The parameter can either be a sklearn Transformer which has an inverse_transform method, or a tuple of callables (transform_func, inverse_transform_func)
TYPE:
|
max_epochs |
Overwrite maximum number of epochs to be run. Defaults to None.
TYPE:
|
min_epochs |
Overwrite minimum number of epochs to be run. Defaults to None.
TYPE:
|
seed |
(int): Random seed for reproducibility. Defaults to 42.
TYPE:
|
callbacks |
List of callbacks to be used during training. Defaults to None.
TYPE:
|
datamodule |
The datamodule. If provided, will ignore the rest of the parameters like train, test etc and use the datamodule. Defaults to None.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
pl.Trainer
|
pl.Trainer: The PyTorch Lightning Trainer instance |
Source code in src/pytorch_tabular/tabular_model.py
589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 |
|
load_best_model()
Loads the best model after training is done.
Source code in src/pytorch_tabular/tabular_model.py
load_from_checkpoint(dir, map_location=None, strict=True)
classmethod
(Deprecated: Use load_model
instead) Loads a saved model from the directory.
PARAMETER | DESCRIPTION |
---|---|
dir |
The directory where the model was saved, along with the checkpoints
TYPE:
|
map_location |
If your checkpoint saved a GPU model and you now load on CPUs or a different number of GPUs, use this to map to the new setup. The behaviour is the same as in torch.load()
TYPE:
|
strict |
Whether to strictly enforce that the keys in checkpoint_path match the keys returned by this module's state dict. Default: True.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
TabularModel
|
The saved TabularModel
TYPE:
|
Source code in src/pytorch_tabular/tabular_model.py
load_model(dir, map_location=None, strict=True)
classmethod
Loads a saved model from the directory.
PARAMETER | DESCRIPTION |
---|---|
dir |
The directory where the model wa saved, along with the checkpoints
TYPE:
|
map_location |
If your checkpoint saved a GPU model and you now load on CPUs or a different number of GPUs, use this to map to the new setup. The behaviour is the same as in torch.load()
TYPE:
|
strict |
Whether to strictly enforce that the keys in checkpoint_path match the keys returned by this module's state dict. Default: True.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
TabularModel
|
The saved TabularModel
TYPE:
|
Source code in src/pytorch_tabular/tabular_model.py
323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 |
|
load_weights(path)
Loads the model weights in the specified directory.
PARAMETER | DESCRIPTION |
---|---|
path |
The path to the file to load the model from
TYPE:
|
predict(test, quantiles=[0.25, 0.5, 0.75], n_samples=100, ret_logits=False, include_input_features=True, device=None)
Uses the trained model to predict on new data and return as a dataframe.
PARAMETER | DESCRIPTION |
---|---|
test |
The new dataframe with the features defined during training
TYPE:
|
quantiles |
For probabilistic models like Mixture Density Networks, this specifies
the different quantiles to be extracted apart from the
TYPE:
|
n_samples |
Number of samples to draw from the posterior to estimate the quantiles. Ignored for non-probabilistic models. Defaults to 100
TYPE:
|
ret_logits |
Flag to return raw model outputs/logits except the backbone features along with the dataframe. Defaults to False
TYPE:
|
include_input_features |
Flag to include the input features in the returned dataframe. Defaults to True
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
pd.DataFrame
|
pd.DataFrame: Returns a dataframe with predictions and features (if |
Source code in src/pytorch_tabular/tabular_model.py
1130 1131 1132 1133 1134 1135 1136 1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 |
|
prepare_dataloader(train, validation=None, test=None, train_sampler=None, target_transform=None, seed=42)
Prepares the dataloaders for training and validation.
PARAMETER | DESCRIPTION |
---|---|
train |
Training Dataframe
TYPE:
|
validation |
If provided, will use this dataframe as the validation while training. Used in Early Stopping and Logging. If left empty, will use 20% of Train data as validation. Defaults to None.
TYPE:
|
test |
If provided, will use as the hold-out data, which you'll be able to check performance after the model is trained. Defaults to None.
TYPE:
|
train_sampler |
Custom PyTorch batch samplers which will be passed to the DataLoaders. Useful for dealing with imbalanced data and other custom batching strategies
TYPE:
|
target_transform |
If provided, applies the transform to the target before modelling and inverse the transform during prediction. The parameter can either be a sklearn Transformer which has an inverse_transform method, or a tuple of callables (transform_func, inverse_transform_func)
TYPE:
|
seed |
Random seed for reproducibility. Defaults to 42.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
TabularDatamodule
|
The prepared datamodule
TYPE:
|
Source code in src/pytorch_tabular/tabular_model.py
prepare_model(datamodule, loss=None, metrics=None, metrics_prob_inputs=None, optimizer=None, optimizer_params={})
Prepares the model for training.
PARAMETER | DESCRIPTION |
---|---|
datamodule |
The datamodule
TYPE:
|
loss |
Custom Loss functions which are not in standard pytorch library
TYPE:
|
metrics |
Custom metric functions(Callable) which has the signature metric_fn(y_hat, y) and works on torch tensor inputs
TYPE:
|
metrics_prob_inputs |
This is a mandatory parameter for classification metrics. If the metric function requires probabilities as inputs, set this to True. The length of the list should be equal to the number of metrics. Defaults to None.
TYPE:
|
optimizer |
Custom optimizers which are a drop in replacements for standard PyToch optimizers. This should be the Class and not the initialized object
TYPE:
|
optimizer_params |
The parmeters to initialize the custom optimizer.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
BaseModel
|
The prepared model
TYPE:
|
Source code in src/pytorch_tabular/tabular_model.py
pretrain(train, validation=None, optimizer=None, optimizer_params={}, max_epochs=None, min_epochs=None, seed=42, callbacks=None, datamodule=None)
The pretrained method which takes in the data and triggers the training.
PARAMETER | DESCRIPTION |
---|---|
train |
Training Dataframe
TYPE:
|
validation |
If provided, will use this dataframe as the validation while training. Used in Early Stopping and Logging. If left empty, will use 20% of Train data as validation. Defaults to None.
TYPE:
|
optimizer |
Custom optimizers which are a drop in replacements for standard PyToch optimizers. This should be the Class and not the initialized object
TYPE:
|
optimizer_params |
The parmeters to initialize the custom optimizer.
TYPE:
|
max_epochs |
Overwrite maximum number of epochs to be run. Defaults to None.
TYPE:
|
min_epochs |
Overwrite minimum number of epochs to be run. Defaults to None.
TYPE:
|
seed |
(int): Random seed for reproducibility. Defaults to 42.
TYPE:
|
callbacks |
List of callbacks to be used during training. Defaults to None.
TYPE:
|
datamodule |
The datamodule. If provided, will ignore the rest of the parameters like train, test etc and use the datamodule. Defaults to None.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
pl.Trainer
|
pl.Trainer: The PyTorch Lightning Trainer instance |
Source code in src/pytorch_tabular/tabular_model.py
save_config(dir)
Saves the config in the specified directory.
save_datamodule(dir)
Saves the datamodule in the specified directory.
PARAMETER | DESCRIPTION |
---|---|
dir |
The path to the directory to save the datamodule
TYPE:
|
Source code in src/pytorch_tabular/tabular_model.py
save_model(dir)
Saves the model and checkpoints in the specified directory.
PARAMETER | DESCRIPTION |
---|---|
dir |
The path to the directory to save the model
TYPE:
|
Source code in src/pytorch_tabular/tabular_model.py
save_model_for_inference(path, kind='pytorch', onnx_export_params={'opset_version': 12})
Saves the model for inference.
PARAMETER | DESCRIPTION |
---|---|
path |
path to save the model
TYPE:
|
kind |
"pytorch" or "onnx" (Experimental)
TYPE:
|
onnx_export_params |
parameters for onnx export to be passed to torch.onnx.export
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
bool
|
True if the model was saved successfully
TYPE:
|
Source code in src/pytorch_tabular/tabular_model.py
save_weights(path)
Saves the model weights in the specified directory.
PARAMETER | DESCRIPTION |
---|---|
path |
The path to the file to save the model
TYPE:
|
summary(max_depth=-1)
Prints a summary of the model.
PARAMETER | DESCRIPTION |
---|---|
max_depth |
The maximum depth to traverse the modules and displayed in the summary. Defaults to -1, which means will display all the modules.
TYPE:
|
Source code in src/pytorch_tabular/tabular_model.py
train(model, datamodule, callbacks=None, max_epochs=None, min_epochs=None)
Trains the model.
PARAMETER | DESCRIPTION |
---|---|
model |
The PyTorch Lightning model to be trained.
TYPE:
|
datamodule |
The datamodule
TYPE:
|
callbacks |
List of callbacks to be used during training. Defaults to None.
TYPE:
|
max_epochs |
Overwrite maximum number of epochs to be run. Defaults to None.
TYPE:
|
min_epochs |
Overwrite minimum number of epochs to be run. Defaults to None.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
pl.Trainer
|
pl.Trainer: The PyTorch Lightning Trainer instance |
Source code in src/pytorch_tabular/tabular_model.py
pytorch_tabular.TabularDatamodule(train, config, validation=None, test=None, target_transform=None, train_sampler=None, seed=42)
Bases: pl.LightningDataModule
The Pytorch Lightning Datamodule for Tabular Data.
PARAMETER | DESCRIPTION |
---|---|
train |
The Training Dataframe
TYPE:
|
config |
Merged configuration object from ModelConfig, DataConfig, TrainerConfig, OptimizerConfig & ExperimentConfig
TYPE:
|
validation |
Validation Dataframe. If left empty, we use the validation split from DataConfig to split a random sample as validation. Defaults to None.
TYPE:
|
test |
Holdout DataFrame to check final performance on. Defaults to None.
TYPE:
|
target_transform |
If provided, applies the transform to the target before modelling and inverse the transform during prediction. The parameter can either be a sklearn Transformer which has an inverse_transform method, or a tuple of callables (transform_func, inverse_transform_func)
TYPE:
|
Source code in src/pytorch_tabular/tabular_datamodule.py
add_datepart(df, field_name, frequency, prefix=None, drop=True)
classmethod
Helper function that adds columns relevant to a date in the column field_name
of df
.
PARAMETER | DESCRIPTION |
---|---|
df |
Dataframe
TYPE:
|
field_name |
Date field name
TYPE:
|
frequency |
Frequency string of the form
TYPE:
|
prefix |
Prefix to add to the new columns. Defaults to None.
TYPE:
|
drop |
Drop the original column. Defaults to True.
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Tuple[pd.DataFrame, List[str]]
|
Dataframe with added columns and list of added columns |
Source code in src/pytorch_tabular/tabular_datamodule.py
do_leave_one_out_encoder()
Checks the special condition for NODE where we use a LeaveOneOutEncoder to encode categorical columns DEPRECATED: Automatically encoding categorical columns using LeaveOneOutEncoder is deprecated.
RETURNS | DESCRIPTION |
---|---|
bool
|
True if LeaveOneOutEncoder is used
TYPE:
|
Source code in src/pytorch_tabular/tabular_datamodule.py
load_datamodule(path)
classmethod
Loads a datamodule from a path.
PARAMETER | DESCRIPTION |
---|---|
path |
Path to the datamodule
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
TabularDatamodule
|
The datamodule loaded from the path
TYPE:
|
Source code in src/pytorch_tabular/tabular_datamodule.py
make_date(df, date_field)
classmethod
Make sure df[date_field]
is of the right date type.
PARAMETER | DESCRIPTION |
---|---|
df |
Dataframe
TYPE:
|
date_field |
Date field name
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
pd.DataFrame
|
Dataframe with date field converted to datetime |
Source code in src/pytorch_tabular/tabular_datamodule.py
prepare_inference_dataloader(df, batch_size=None)
Function that prepares and loads the new data.
PARAMETER | DESCRIPTION |
---|---|
df |
Dataframe with the features and target
TYPE:
|
batch_size |
Batch size. Defaults to
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
DataLoader
|
The dataloader for the passed in dataframe
TYPE:
|
Source code in src/pytorch_tabular/tabular_datamodule.py
preprocess_data(data, stage='inference')
The preprocessing, like Categorical Encoding, Normalization, etc. which any dataframe should undergo before feeding into the dataloder.
PARAMETER | DESCRIPTION |
---|---|
data |
A dataframe with the features and target
TYPE:
|
stage |
Internal parameter. Used to distinguisj between fit and inference. Defaults to "inference".
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
Tuple[pd.DataFrame, list]
|
Returns the processed dataframe and the added features(list) as a tuple |
Source code in src/pytorch_tabular/tabular_datamodule.py
save_dataloader(path)
Saves the dataloader to a path.
PARAMETER | DESCRIPTION |
---|---|
path |
Path to save the dataloader
TYPE:
|
Source code in src/pytorch_tabular/tabular_datamodule.py
setup(stage=None)
Data Operations you want to perform on all GPUs, like train-test split, transformations, etc. This is called before accessing the dataloaders.
PARAMETER | DESCRIPTION |
---|---|
stage |
Internal parameter to distinguish between fit and inference. Defaults to None.
TYPE:
|
Source code in src/pytorch_tabular/tabular_datamodule.py
test_dataloader(batch_size=None)
Function that loads the validation set.
PARAMETER | DESCRIPTION |
---|---|
batch_size |
Batch size. Defaults to
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
DataLoader
|
Test dataloader
TYPE:
|
Source code in src/pytorch_tabular/tabular_datamodule.py
time_features_from_frequency_str(freq_str)
classmethod
Returns a list of time features that will be appropriate for the given frequency string.
PARAMETER | DESCRIPTION |
---|---|
freq_str |
Frequency string of the form
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
List[str]
|
List of added features |
Source code in src/pytorch_tabular/tabular_datamodule.py
312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 |
|
train_dataloader(batch_size=None)
Function that loads the train set.
PARAMETER | DESCRIPTION |
---|---|
batch_size |
Batch size. Defaults to
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
DataLoader
|
Train dataloader
TYPE:
|
Source code in src/pytorch_tabular/tabular_datamodule.py
update_config(config)
Calculates and updates a few key information to the config object.
PARAMETER | DESCRIPTION |
---|---|
config |
The config object
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
InferredConfig
|
The updated config object
TYPE:
|
Source code in src/pytorch_tabular/tabular_datamodule.py
val_dataloader(batch_size=None)
Function that loads the validation set.
PARAMETER | DESCRIPTION |
---|---|
batch_size |
Batch size. Defaults to
TYPE:
|
RETURNS | DESCRIPTION |
---|---|
DataLoader
|
Validation dataloader
TYPE:
|