Exploring Advanced Features with PyTorch Tabular

Pre-requisites: Intermediate knowledge of Deep Learning and basic knowledge of Tabular Problems like Regression and Classification. Also go through the Approaching Any Tabular Problem with PyTorch Tabular tutorial.
Level: Intermediate

In the Approaching Any Tabular Problem with PyTorch Tabular, we saw how to start using PyTorch Tabular with it's intelligent defaults. In this tutorial, we will see how to leverage sightly advanced features of PyTorch Tabular to have more flexibility and typically better results. In this tutorial, we assume you already know how to use basic features of PyTorch Tabular. If you are not familiar with PyTorch Tabular, please go through the Approaching Any Tabular Problem with PyTorch Tabular tutorial first.

from rich.pretty import pprint
import numpy as np

Data¶

First of all, let's create a synthetic data which is a mix of numerical and categorical features and have multiple targets for regression. It means that there are multiple columns which we need to predict with the same set of features. Most classical machine learning models (like the ones in scikit-learn) only handle single target problems. We will have to train different models for each target. While this is perfectly fine, it is not the most efficient way to do it. First of all, we will have to train multiple models which will take more time. Secondly, if the two targets have some relationship between them, we are not leveraging that information. For example, if we are predicting the price of a house and the area of the house, we know that the two targets are related. If we train two different models, we are not leveraging this information.

PyTorch Tabular can handle multi-target problems out of the box (only for Regression currently). We just need to pass the list of target columns to the target parameter of the DataConfig class.

from sklearn.model_selection import train_test_split
from pytorch_tabular.utils import make_mixed_dataset, print_metrics
data, cat_col_names, num_col_names = make_mixed_dataset(
    task="regression", n_samples=100000, n_features=20, n_categories=4, n_targets=2, random_state=42
)
target_cols = ["target_0", "target_1"]
train, test = train_test_split(data, random_state=42)
train, val = train_test_split(train, random_state=42)

Let's import the required classes from PyTorch Tabular.

from pytorch_tabular import TabularModel
from pytorch_tabular.models import CategoryEmbeddingModelConfig, GANDALFConfig
from pytorch_tabular.config import DataConfig, OptimizerConfig, TrainerConfig
from pytorch_tabular.models.common.heads import LinearHeadConfig

Defining the Model¶

We already know the basic steps to define a model in PyTorch Tabular. We need to define a few configs and initialize the TabularModel. Let's do that.

But this time, let's look at some of the more advanced features of PyTorch Tabular.

1. `DataConfig`¶

We know we need to define the target, continuous and categorical columns in the DataConfig. But there are a few more parameters which we can use to customize the data processing pipeline. Let's look at a few of them.

normalize_continuous: For better optimization, DL models prefer normalized continuous features. By default, PyTorch Tabular normalizes the continuous features. But if you want to use a custom normalization, you can do the normalization outside of PyTorch Tabular and pass normalize_continuous=False to the DataConfig. PyTorch Tabular will not normalize the continuous features and will use the values as is.
continuous_feature_transform: Sometimes, we want to transform the continuous features before feeding them to the model. For example, we might want to take the log of a feature or take the square root of a feature, etc. PyTorch Tabular has a few such transformations built-in. You can pass the name of the transformation to the continuous_feature_transform parameter of the DataConfig. The allowable inputs are: ['quantile_normal', 'yeo-johnson', 'quantile_uniform', 'box-cox']. This internally uses a few scikit-learn transformers to do the transformation. You can read about these here
num_workers and pin_memory are two parameters which are used to speed up the data loading process. If you are using a GPU, you can set num_workers to a number greater than 0 (Only for Linux). This will use multiple CPU cores to load the data in parallel. pin_memory is a parameter which is used to speed up the data transfer from CPU to GPU. If you are using a GPU, you can set pin_memory=True to speed up the data transfer. You can read more about these here

For the entire list of parameters, please refer to the API Reference in the docs.

data_config = DataConfig(
    target=target_cols,
    continuous_cols=num_col_names,
    categorical_cols=cat_col_names,
    num_workers=10,
    normalize_continuous_features=True,
    continuous_feature_transform="quantile_normal",
)

2. `TrainingConfig`¶

Training a Deep Learning model can get arbritarily complex. PyTorch Tabular, by inheriting PyTorch Lightning, offloads the whole workload onto the underlying PyTorch Lightning Framework. In the basic tutorial, we just scratched the surface of what PyTorch Lightning can do. In this tutorial, we will see how to leverage some of the more advanced features of PyTorch Lightning as well as a few convenience features of PyTorch Tabular.

We already know that we can pass the max_epochs, batch_size to TrainerConfig. Let's look at a few more parameters.

accelerator

PyTorch Lightning supports training on multiple GPUs and TPUs. You can pass the accelerator type to the accelerator parameter of the TrainerConfig. The allowable inputs are: ['cpu','gpu','tpu','ipu','auto']. cpu let's you train the model on CPUs. gpu let's you train the model on GPUs. tpu let's you train the model on TPUs. ipu let's you train the model on IPUs. auto let's PyTorch Lightning choose the best accelerator for you. You can read more about these here

devices and devices_list

devices let's you choose number of devices (CPU Cores, GPUs etc.) to train the model on. -1 means training on all available devices. devices_list let's you choose the specific devices to train the model on. For example, if you want to train the model on GPU 0 and 1, you can pass devices_list=[0,1].

min_epochs and max_time

These are also parameters that help you control the training, apart from max_epochs. min_epochs let's you specify the minimum number of epochs to train the model for (typically useful when you are using early stopping). max_time let's you specify the maximum time to train the model for.

Early Stopping Parameters

PyTorch Lightning supports early stopping out of the box. Early stopping is a technique to stop the training process if the model is not improving by monitoring a loss/metric on the validation set. You can pass the following parameters to the TrainerConfig to use early stopping: > early_stopping: The loss/metric to monitor for early stopping. If set to None, early stopping will not be used.
> early_stopping_min_delta: The minimum change in the loss/metric that qualifies as an improvement for early stopping.
> early_stopping_mode: The direction in which the loss/metric should be optimized. Choices are: ['max', 'min'].
> early_stopping_patience: The number of epochs to wait until there is no further improvement in the loss/metric.
> early_stopping_kwargs: Additional keyword arguments for the early stopping callback. Refer to the PyTorch Lightning EarlyStopping callback documentation for more details.
> load_best: If True, loads the best model weights at the end of training. Defaults to True.

Checkpoint Saving Parameters

PyTorch Lightning supports saving the model checkpoints automatically. Checkpoint saving is a technique to save the model weights at regular intervals during training. This is useful in case the training process is interrupted due to some reason, or if we want to go back and use a weight from a previous epoch. Typically useful when using early stopping, so that we can roll back and use the best model weights. You can pass the following parameters to the TrainerConfig to save the model checkpoints: > checkpoints: str: The loss/metric that needed to be monitored for checkpoints. If None, there will be no checkpoints
> checkpoints_path: str: The path where the saved models will be. Defaults to saved_models
> checkpoints_mode: str: The direction in which the loss/metric should be optimized. Choices are max and min. Defaults to min
> checkpoints_save_top_k: int: The number of best models to save. If you want to save more than one best models, you can set this parameter to >1. Defaults to 1

Note: Make sure the name of the metric/loss you want to track exactly matches the ones in the logs. Recommended way is to run a model and check the results by evaluating the model. From the resulting dictionary, you can pick up a key to track during training.

Learning Rate Finder

First proposed in this paper Cyclical Learning Rates for Training Neural Networks and the subsequently popularized by fast.ai, is a technique to reach the neighbourhood of optimum learning rate without costly search. PyTorch Tabular let's you find the optimal learning rate(using the method proposed in the paper) and automatically use that for training the network. All this can be turned on with a simple flag auto_lr_find

Controlling the Gradients/Optimization

While training, there can be situations where you need to have a heavier control on the gradient optimization process. For eg. if the gradients are exploding, you might want to clip gradient values before each update. gradient_clip_val let's you do that.

Sometimes, you might want to accumulate gradients across multiple batches before you do a backward propoagation(may be because a larger batch size does not fit in your GPU). PyTorch Tabular let's you do this with accumulate_grad_batches

Debugging Analysis

Many times, you will need to debug a model and see why it is not performing as it is supposed to. Or even, while developing new models, you will need to debug the model a lot. PyTorch Lightning has a few features for this usecase, which Pytorch Tabular has adopted.

To find out performance bottle necks, we can use:

profiler: Optional[str]: To profile individual steps during training and assist in identifying bottlenecks. Choices are: None simple advanced. Defaults to None

To check if the whole setup runs without errors, we can use:

fast_dev_run: Optional[str]: Quick Debug Run of Val. Defaults to False

If the model is not learning properly:

overfit_batches: float: Uses this much data of the training set. If nonzero, will use the same training set for validation and testing. If the training dataloaders have shuffle=True, Lightning will automatically disable it. Useful for quickly debugging or trying to overfit on purpose. Defaults to 0
track_grad_norm: bool: This is only used if experiment tracking is setup. Track and Log Gradient Norms in the logger. -1 by default means no tracking. 1 for the L1 norm, 2 for L2 norm, etc. Defaults to False. If the gradient norm falls to zero quickly, then we have a problem.

For the entire list of parameters, please refer to the API Reference in the docs.

YAML Config

PyTorch Tabular let's you define the any config either as a Config Class or as a YAML file. YAML files are a great way to store configs. It is human readable and easy to edit. PyTorch Tabular let's you define the config in a YAML file and pass the path of the YAML file to the respective config parameters of the TabularModel.

We have defined a YAML file for TrainerConfig with the below contents:

batch_size: 1024
fast_dev_run: false
max_epochs: 20
min_epochs: 1
accelerator: 'auto'
devices: -1
accumulate_grad_batches: 1
auto_lr_find: true
check_val_every_n_epoch: 1
gradient_clip_val: 0.0
overfit_batches: 0.0
profiler: null
early_stopping: null
early_stopping_min_delta: 0.001
early_stopping_mode: min
early_stopping_patience: 3
checkpoints: valid_loss
checkpoints_path: saved_models
checkpoints_mode: min
checkpoints_save_top_k: 1
load_best: true
track_grad_norm: -1

Let's use that instead of defining the TrainerConfig as a class.

3. `OptimizerConfig`¶

The Optimizer is at the heart of the Gradient Descent process and is a key component that we need to train a good model, and OptimizerConfig let's you customize the optimizer and learning rate scheduler to your needs. Pytorch Tabular uses Adam optimizer with a learning rate of 1e-3 by default. This is mainly because of a rule of thumb which provides a good starting point.

Let's look at a few parameters which we can use to customize the optimizer.

optimizer

PyTorch Tabular let's you choose any optimizer from the torch.optim package by passing in the name of the optimizer as a string to the optimizer parameter of the OptimizerConfig. This includes optimizers like Adam, SGD, RMSProp, AdamW etc. You can read more about these here. In addition to these, PyTorch Tabular also supports any valid PyTorch Optimizer. If it's an optimizer which can be accessed from a namespace (like a library you installed), we can pass the fully qualified name of the optimizer to the optimizer parameter. For example, if you have installed the torch_optimizer library, and want to use QHAdam from there, you can pass torch_optimizer.QHAdam to the optimizer parameter. If it's an optimizer which is not accessible from a namespace, you cannot pass it in the OptimizerConfig, but will be able to use it during the fit which we will see later.

optimizer_params

PyTorch Tabular let's you pass any valid optimizer parameters (except learning rate) to the optimizer_params parameter of the OptimizerConfig. For example, if you want to use a weight decay of 1e-2, you can pass optimizer_params={'weight_decay':1e-2} to the OptimizerConfig. You need to refer to the documentation of the optimizer you are using to find out the valid parameters.

lr_scheduler

Learning Schedulers are a way to control the learning rate during training. Sometimes, it is beneficial to start off with a slightly higher learning rate and reduce it as we progress in training. Sometimes, it helps if we reduce learning rate when we hit a plateau while learning. PyTorch Tabular let's you choose any learning rate scheduler from the torch.optim.lr_scheduler package by passing in the name of the scheduler as a string to the lr_scheduler parameter of the OptimizerConfig. This includes schedulers like StepLR, ReduceLROnPlateau, CosineAnnealingLR etc. You can read more about these here.

lr_scheduler_params

PyTorch Tabular let's you pass any valid learning rate scheduler parameters to the lr_scheduler_params parameter of the OptimizerConfig. For example, if you want to use a step size of 10 for StepLR, you can pass lr_scheduler_params={'step_size':10} to the OptimizerConfig. You need to refer to the documentation of the scheduler you are using to find out the valid parameters.

lr_scheduler_monitor_metric

This is a parameter which is used only if you are using ReduceLROnPlateau as the learning rate scheduler. This is the metric which will be monitored to reduce the learning rate. This should be a valid loss or metric defined in the model.

Here, let's use a CosineAnnealingLR with a warmup of 10 epochs as the Learning Rate Scheduler and an optimizer from a third-party library torch_optimizer (You will need to install that library).

optimizer_config = OptimizerConfig(
    optimizer="torch_optimizer.QHAdam",
    optimizer_params={"nus": (0.7, 1.0), "betas": (0.95, 0.998)},
    lr_scheduler="CosineAnnealingWarmRestarts",
    lr_scheduler_params={"T_0": 10, "T_mult": 1, "eta_min": 1e-5},
)

4. `ModelConfig`¶

ModelConfig is how you decide the kind of model and the model parameters to be used in the model. PyTorch Tabular has implemented a few SOTA models for tabular data. Internally in PyTorch Tabular, a model has three components:

Embedding Layer - This is the part of the model which processes the categorical and continuous features into a single tensor.
Backbone - This is the real architecture of the model. It is the part of the model which takes the output of the embedding layer and does representation learning on it. The output is again a single tensor, which is the learned features from representation learning.
Head - This is the part of the model which takes the output of the backbone and does the final classification/regression. The output of the head is the final prediction.

from pytorch_tabular import available_models
pprint(available_models())

[
│   'AutoIntConfig',
│   'CategoryEmbeddingModelConfig',
│   'DANetConfig',
│   'FTTransformerConfig',
│   'GANDALFConfig',
│   'GatedAdditiveTreeEnsembleConfig',
│   'MDNConfig',
│   'NodeConfig',
│   'TabNetModelConfig',
│   'TabTransformerConfig'
]

You can choose any of these models by importing the corresponding class from pytorch_tabular.models, set the parameters and pass it to the model_config parameter of the TabularModel. All these config classes have been inherited from a common ModelConfig with a few standard parameters and any model specific parameters are added to the respective config class. And because of the inheritance, we have access to all the parameters of the ModelConfig in all the model config classes.

Let's first look at some common parameters of the ModelConfig:

task: str: This defines whether we are running the model for a regression, classification task, or as a backbone model. backbone task is used in Self-Supervised models and in Mixed Density Models.

Head Configuration

head: Optional[str]: The head to be used for the model. Should be one of the heads defined in pytorch_tabular.models.common.heads. Defaults LinearHead. Below cell shows the list of available heads.

import pytorch_tabular as pt
pprint([h for h in dir(pt.models.common.heads) if (not h.startswith("_") and "Head" in h and "Config" not in h)])

['LinearHead', 'MixtureDensityHead']

head_config: Optional[Dict]: The config as a dict which defines the head. If left empty, will be initialized as default linear head. Although the input is a dictionary, it is recommended to use the <specific>HeadConfig class for the respective head to make sure you re only using allowable parameters. For example, if you are using LinearHead, you can use LinearHeadConfig to define the head config. Below cell shows the list of available head configs.

pprint([h for h in dir(pt.models.common.heads) if (not h.startswith("_") and "Head" in h and "Config" in h)])

['LinearHeadConfig', 'MixtureDensityHeadConfig']

Embedding Configuration

embedding_dims: Optional[List]: The dimensions of the embedding for each categorical column as a list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of the categorical column using the rule min(50, (x + 1) // 2)
embedding_dropout: float: Dropout to be applied to the Categorical Embedding. Defaults to 0.0
batch_norm_continuous_input: bool: If True, we will normalize the continuous layer by passing it through a BatchNorm layer before combining with the categorical embeddings. Defaults to True

Other Configuration

learning_rate: float: The learning rate of the model. Defaults to 1e-3.
loss: Optional[str]: The loss function to be applied. By Default it is MSELoss for regression and CrossEntropyLoss for classification. For most cases, these work well. But if you want to use any other loss function from PyTorch, you can pass it here. For example, if you want to use BCEWithLogitsLoss, you can pass loss='BCEWithLogitsLoss'. You can read more about the losses in PyTorch here. We can also use custom loss functions. We will see how to do that later.

Note: Choosing the Loss Function should not be treated like a hyperparameter which you blindly apply, but a well thought out decision.

metrics: Optional[List[str]]: The list of metrics you need to track during training. The metrics should be one of the functional metrics implemented in torchmetrics. You can find the entire list here. By default, it is accuracy if classification and mean_squared_error for regression. We can also use custom metrics. We will see how to do that later.
metrics_prob_input: Optional[List[bool]]: Is a mandatory parameter for classification metrics defined in the config. This defines whether the input to the metric function is the probability or the class. Length should be same as the number of metrics. Defaults to None.
metrics_params: Optional[List]: The parameters to be passed to the metrics function. Some functions like the f1_score need additional parameters like task to be properly defined. This also let's you choose how to average the metric in multi-class classification.
target_range: Optional[List]: For classification problems, the targets are always 0 or 1, once we one-hot the class labels. But for regression, it's a real valued value between (-inf, inf), theoretically. More practically, it usually is between known bounds. Sometimes, it is an extra burden on the model to learn this bounds and target_range is a way to take that burden off the model. This technique was popularized by Jeremy Howard in fast.ai and is quite effective in practice. If we know that the output value of a regression should be between a min and max value, we can provide those values as a tuple to target_range. But a caveat is that there is an assumption that the distribution of the target is normal. If the distribution is not normal, it might not work as expected. In case of multiple targets, we set the target_range to be a list of tuples, each entry in the list corresponds to the respective entry in the target parameter. For classification problems, this parameter is ignored.

target_range = [(train[target].min() * 0.8, train[target].max() * 1.2)]

virtual_batch_size: Optional[int]: BatchNorm is a very useful technique (a necessary evil) to normalize the activations of the network. It typically leads to faster convergence, and stable training regimes. But when training with large batch sizes, BatchNorm can lead to "overfitting" (not in the traditional sense). One way to overcome this is to use GhostBatchNorm, where we split the batch into virtual batches and apply BatchNorm on each virtual batch. By setting virtual_batch_size to a number greater than 1, PyTorch Tabular will automatically convert all BatchNorms to GhostBatchNorms with the specified virtual batch size.
seed: int: The seed for reproducibility. Defaults to 42

Now, each model we choose will have it's own set of parameters. The API Reference in the docs has the list of all the models and their respective parameters. Here let's use a simple MLP with categorical embeddings. This is called CategoryEmbeddingModelConfig in PyTorch Tabular.

The key parameters we are going to use are:

layers: str: Hyphen-separated number of layers and units in the classification head. Defaults to "128-64-32"
activation: str: The activation type in the classification head. The default activations in PyTorch like ReLU, TanH, LeakyReLU, etc. Defaults to ReLU
initialization: str: Initialization scheme for the linear layers. Choices are: kaiming xavier random. Defaults to kaiming
use_batch_norm: bool: Flag to include a BatchNorm layer after each Linear Layer+DropOut. Defaults to False
dropout: float: The probability of the element to be zeroed. This applies to all the linear layers. Defaults to 0.0

head_config = LinearHeadConfig(
    layers="",  # No additional layer in head, just a mapping layer to output_dim
    dropout=0.1,
    initialization="kaiming",
).__dict__  # Convert to dict to pass to the model config (OmegaConf doesn't accept objects)

model_config = CategoryEmbeddingModelConfig(
    task="regression",
    layers="64-32-16",
    activation="LeakyReLU",
    dropout=0.1,
    initialization="kaiming",
    head="LinearHead",  # Linear Head
    head_config=head_config,  # Linear Head Config
    learning_rate=1e-3,
    target_range=[(float(train[col].min()),float(train[col].max())) for col in target_cols]
)

5. `TabularModel`¶

After defining all the configs, we need to put it all together and this is where TabularModel comes in. TabularModel is the core work horse, which orchestrates and sets everything up.

TabularModel parses the configs and:

initializes the model
sets up the experiment tracking framework (if defined)
initializes and sets up the TabularDatamodule which handles all the data transformations and preparation of the DataLoaders
sets up the callbacks and the Pytorch Lightning Trainer
enables you to train, save, load, predict, among other things

Now that we have defined all the configs, let's initialize the TabularModel with the configs. We can pass the configs either as a class or as a YAML file. Let's use the YAML file for TrainerConfig and the class for the rest of the configs.

Apart from the configs, we can also pass the following parameters to the TabularModel:

verbose: bool: If True, will print different messages during training indicating the progress. Defaults to True
suppress_lightning_logger: PyTorch Lightning prints out a lot of logs while training and this parameter let's you suppress those logs. Or to be more specific, it sets the logging level of the PyTorch Lightning logger to ERROR. Defaults to False as Pytorch Lightning logs are very useful for debugging. Only turn them off if you are sure you don't need them.

from pytorch_tabular import TabularModel
tabular_model = TabularModel(
    data_config=data_config,
    model_config=model_config,
    optimizer_config=optimizer_config,
    trainer_config="trainer_config.yml",
    verbose=True,
    suppress_lightning_logger=False
)

2024-01-11 14:43:34,218 - {pytorch_tabular.tabular_model:140} - INFO - Experiment Tracking is turned off

Training the Model¶

In PyTorch Tabular, there are two ways of training the model - A High Level API and a Low Level API. The High Level API is a wrapper around the Low Level API and is the recommended way to train the model. But the Low Level API gives you more control over the training process and is useful if you want to do some custom training. Let's look at both of them.

1. High Level API¶

The High Level API is a single line of code which does everything for you. You just need to call the fit method of the TabularModel and it will take care of everything else. But we have already seen how to fit the model in the basic tutorial. So, let's look at a few more parameters of the fit method.

loss: This is where you can use a custom loss function. You can pass any valid PyTorch loss function to the loss parameter of the fit method.
metrics: This is where you can use a custom metric function. The parameter accepts a list of Callables with the signature: metric_fn(y_hat, y), where y_hat and y are tensors. y_hat is of shape (batch_size, num_classes) for classification and (batch_size, 1) for regression. y is of shape (batch_size, 1) for classification and (batch_size, num_targets) for regression.
metrics_prob_inputs: This is a mandatory parameter if you are using the metrics parameter. This is a list of boolean values which defines whether the input to the metric function is the probability or the class. Length should be same as the number of metrics. Defaults to None.
optimizer and optimizer_params: This is where you can use a custom optimizer. You can pass any valid PyTorch optimizer to the optimizer parameter of the fit method. You can also pass any valid optimizer parameters to the optimizer_params parameter of the fit method.
train_samplers: Sometimes, we would want to enforce some custom behaviour on the batch sampling. This parameter accepts any inherited class of torch.utils.data.Sampler. For example, if you want to use WeightedRandomSampler, you can pass train_samplers=WeightedRandomSampler(...). You can read more about the samplers here.
target_transform: This parameter is a Tuple (of size 2) of Callables and let's you use any custom transformation on the target. This is useful if you want to do some custom transformation on the target before passing it to the loss function. For example, if you want to take the log of the target, you can pass target_transform=[np.log, np.exp]. The first function will be applied to the target before passing it to the loss function and the second function will be applied to the output of the model.
callbacks: PyTorch Lightning supports a lot of callbacks out of the box. You can read more about them here. PyTorch Lightning also supports custom callbacks. These callbacks are directly added to the Lightning Trainer.
cache_data - By default, PyTorch Tabular saves the data in the TabularDataModule. This is useful if you want to train the model multiple times without having to load the data again and again. But if you are running out of memory, you can choose to save the files to a path and load them from there. This parameter accepts a string which is the path where the data will be saved. Default value is memory.

There are some more parameters, but the full list of options you can read in the API reference.

Let's leverage a few of these custom options to train the model. We are using a - Dummy Target Transformation - Custom Loss Function, which is nothing but the MSE loss. This overrides the loss function defined in the ModelConfig - Custom Optimizer, which is just Lamb from torch_optimizer (Just to demonstrate how to use a custom optimizer). This overrides the optimizer defined in the OptimizerConfig - Custom Metric Function, which is quite meaningless, but just to show how to use it. This overrides the metric defined in the ModelConfig - Custom Callback, which just prints some message during different stages of training

Note PyTorch Tabular passes the raw output from the models to the loss function. In classification problems, the raw output is the logits. It is the responsibility of the loss function to apply the right activations. And if you don't understand what that means, leave the loss functions at default values or use pre-implemented loss functions.

import torch
import torch.nn as nn
from pytorch_lightning.callbacks import DeviceStatsMonitor
from torch_optimizer import Lamb


class CustomLoss(nn.Module):
    def __init__(self):
        super(CustomLoss, self).__init__()

    def forward(self, inputs, targets):
        loss = torch.mean((inputs - targets) ** 2)
        return loss.mean()


def custom_metric(y_true, y_pred):
    return torch.mean(torch.pow(y_true - y_pred, 3))

CustomOptimizer = Lamb
## Sample of how a Custom Optimizer would look like
# class CustomOptimizer(Optimizer):
#     def __init__(
#         self,
#         params,
#         lr: float = 1e-3,
#         betas=(0.9, 0.999),
#         eps: float = 1e-6,
#         weight_decay: float = 0,
#         clamp_value: float = 10,
#         adam: bool = False,
#         debias: bool = False,
#     ):
#         ## some code here
#         defaults = dict(lr=lr, betas=betas, eps=eps, weight_decay=weight_decay)
#         super().__init__(params, defaults)

#     def step(self, closure=None):
#         ## Some code here
#         return loss


tabular_model.fit(
    train=train,
    validation=val,
    loss=CustomLoss(),
    metrics=[custom_metric],
    metrics_prob_inputs=[False],
    target_transform=[lambda x: x + 100, lambda x: x - 100],
    optimizer=CustomOptimizer,
    optimizer_params={"weight_decay": 1e-6},
    callbacks=[DeviceStatsMonitor()],
)

Seed set to 42

2024-01-11 14:43:34,615 - {pytorch_tabular.tabular_model:524} - INFO - Preparing the DataLoaders

2024-01-11 14:43:34,631 - {pytorch_tabular.tabular_datamodule:499} - INFO - Setting up the datamodule for          
regression task

2024-01-11 14:43:34,893 - {pytorch_tabular.tabular_model:574} - INFO - Preparing the Model: CategoryEmbeddingModel

2024-01-11 14:43:34,924 - {pytorch_tabular.tabular_model:340} - INFO - Preparing the Trainer

GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs

2024-01-11 14:43:36,420 - {pytorch_tabular.tabular_model:630} - INFO - Auto LR Find Started

You are using a CUDA device ('NVIDIA GeForce RTX 3060 Laptop GPU') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
/home/manujosephv/miniconda3/envs/lightning_upgrade/lib/python3.11/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:639: Checkpoint directory saved_models exists and is not empty.
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

Finding best initial lr:   0%|          | 0/100 [00:00<?, ?it/s]

`Trainer.fit` stopped: `max_steps=100` reached.
Learning rate set to 0.04365158322401657
Restoring states from the checkpoint path at /home/manujosephv/pytorch_tabular/docs/tutorials/.lr_find_4764cd76-0064-458b-881a-e2772b684d4d.ckpt
Restored all states from the checkpoint at /home/manujosephv/pytorch_tabular/docs/tutorials/.lr_find_4764cd76-0064-458b-881a-e2772b684d4d.ckpt

2024-01-11 14:43:40,092 - {pytorch_tabular.tabular_model:643} - INFO - Suggested LR: 0.04365158322401657. For plot 
and detailed analysis, use `find_learning_rate` method.

2024-01-11 14:43:40,095 - {pytorch_tabular.tabular_model:652} - INFO - Training Started

LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

┏━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃   ┃ Name             ┃ Type                      ┃ Params ┃
┡━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ 0 │ custom_loss      │ CustomLoss                │      0 │
│ 1 │ _backbone        │ CategoryEmbeddingBackbone │  4.5 K │
│ 2 │ _embedding_layer │ Embedding1dLayer          │     92 │
│ 3 │ head             │ LinearHead                │     34 │
└───┴──────────────────┴───────────────────────────┴────────┘

Trainable params: 4.6 K                                                                                            
Non-trainable params: 0                                                                                            
Total params: 4.6 K                                                                                                
Total estimated model params size (MB): 0

Output()

`Trainer.fit` stopped: `max_epochs=20` reached.

2024-01-11 14:44:11,064 - {pytorch_tabular.tabular_model:663} - INFO - Training the model completed

2024-01-11 14:44:11,065 - {pytorch_tabular.tabular_model:1487} - INFO - Loading the best model

<pytorch_lightning.trainer.trainer.Trainer at 0x7f693b5f0b90>

We can see the logs of the training process (because we have set verbose=True), and the progress bar which shows the training and validation loss/metric. In addition to it, we can observe that the model summary was printed and some logs about availability and usaged of hardware accelerators like GPUs are printed. This is because we have not set suppress_lightning_logger=True. If we set that, we will not see these logs.

You can further reduce the warnings from PyTorch Lightning by using the warinings module from Python, but it's not recommended because you might miss some important warnings.

import warnings
warnings.filterwarnings("ignore")

2. Low-Level API¶

The low-level API is more flexible and allows you to write more complicated logic like cross validation, ensembling, etc. The low-level API is more verbose and requires you to write more code, but it comes with more control to the user.

The fit method is split into three sub-methods:

prepare_dataloader
prepare_model
train

The parameters that we discussed in the High Level API are passed to the respective sub-methods. Before getting into the details of each of these methods, let's re-initialize the TabularModel and turn off logs.

tabular_model = TabularModel(
    data_config=data_config,
    model_config=model_config,
    optimizer_config=optimizer_config,
    trainer_config="trainer_config.yml",
    verbose=False, # Turn off the verbose to avoid printing logs from different stages
    suppress_lightning_logger=True, # Change Lightning Log Level to WARNING
)

1. `prepare_dataloader`¶

This method is responsible for setting up the TabularDataModule and returns the object. You can save this object using save_dataloader and load it later using load_datamodule to skip the data preparation step. This is useful when you are doing cross validation or ensembling.

So, parameters like train, validation, train_sampler, target_transform, cache_data etc. are passed to this method.

datamodule = tabular_model.prepare_dataloader(
                train=train, validation=val, seed=42, target_transform=[lambda x: x + 100, lambda x: x - 100],
            )

2. `prepare_model`¶

This method is responsible for setting up and initializing the model and takes in the prepared datamodule as an input. It returns the model instance.

This method takes the datamodule as an input along with other parameters like loss, metrics, metrics_prob_inputs, optimizer, and optimizer_params.

from torch_optimizer import Lamb
model = tabular_model.prepare_model(
    datamodule,
    loss=CustomLoss(),
    metrics=[custom_metric],
    metrics_prob_inputs=[False],
    optimizer=Lamb,
    optimizer_params={"weight_decay": 1e-6},
)

3. `train`¶

This method is responsible for training the model and takes in the prepared datamodule and model as an input. It returns the trained model instance.

train takes the datamodule and model as an input along with other parameters like callbacks, max_epochs, min_epochs, and so on.

tabular_model.train(
    model,
    datamodule,
    callbacks=[DeviceStatsMonitor()],
)

/home/manujosephv/miniconda3/envs/lightning_upgrade/lib/python3.11/site-packages/pytorch_lightning/callbacks/model_checkpoint.py:639: Checkpoint directory saved_models exists and is not empty.

Finding best initial lr:   0%|          | 0/100 [00:00<?, ?it/s]

┏━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃   ┃ Name             ┃ Type                      ┃ Params ┃
┡━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ 0 │ custom_loss      │ CustomLoss                │      0 │
│ 1 │ _backbone        │ CategoryEmbeddingBackbone │  4.5 K │
│ 2 │ _embedding_layer │ Embedding1dLayer          │     92 │
│ 3 │ head             │ LinearHead                │     34 │
└───┴──────────────────┴───────────────────────────┴────────┘

Trainable params: 4.6 K                                                                                            
Non-trainable params: 0                                                                                            
Total params: 4.6 K                                                                                                
Total estimated model params size (MB): 0

Output()

<pytorch_lightning.trainer.trainer.Trainer at 0x7f6900d7da90>

Predicting and Evaluating on New Data¶

As we saw in the basic tutorial, we can use the predict method of the TabularModel to predict on new data. But there are a few more parameters which we can use to customize the prediction process.

progress_bar - This parameter lets you turn off or choose the kind of progress bar you need. if rich, it will use colorful, rich progress bars. If tqdm, it will use tqdm to show the progress bar. "None" or None will turn off the progress bar. Defaults to rich.
ret_logits - This is a boolean flag, if turned on will return the raw model output (logits) instead of the probabilities. Typically useful in classification problems. Defaults to False.

prediction = tabular_model.predict(test, progress_bar=None)
prediction.head()

	target_0_prediction	target_1_prediction
75721	143.383545	128.671326
80184	57.778259	35.192749
19864	60.018860	82.361267
76699	-124.672058	-23.280823
92991	33.951477	102.290710

We also saw that we can evaluate the model on new data with existing metrics using evaluate. But there are a few more parameters which we can use to customize the evaluation process.

verbose - A flag, if True, will print out the results as well as return them. Defaults to True
ckpt_path - If provided, will load the model from the checkpoint path and evaluate on the data. If not provided, will use the current model and evaluate on the data. If model checkpointing was enabled, we can also use best to automatically load the best model. Defaults to None

# Current Model
result = tabular_model.evaluate(test, verbose=False)

Output()

# Loading from a stored checkpoint path
best_ckpt_path = tabular_model.trainer.checkpoint_callback.best_model_path
result = tabular_model.evaluate(test, verbose=True, ckpt_path=best_ckpt_path)

Output()

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric        ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│    test_custom_metric     │     54.31338882446289     │
│   test_custom_metric_0    │    -47.28584289550781     │
│   test_custom_metric_1    │    101.59922790527344     │
│         test_loss         │      53.098876953125      │
│        test_loss_0        │     24.37938117980957     │
│        test_loss_1        │    28.719507217407227     │
└───────────────────────────┴───────────────────────────┘

# Using best checkpoint from training
result = tabular_model.evaluate(test, verbose=True, ckpt_path="best")

Output()

┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃        Test metric        ┃       DataLoader 0        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│    test_custom_metric     │     54.31338882446289     │
│   test_custom_metric_0    │    -47.28584289550781     │
│   test_custom_metric_1    │    101.59922790527344     │
│         test_loss         │      53.098876953125      │
│        test_loss_0        │     24.37938117980957     │
│        test_loss_1        │    28.719507217407227     │
└───────────────────────────┴───────────────────────────┘

Congrats!: You have learned how to use most of the advanced features of PyTorch Tabular.
Now try to use these features in your own projects and Kaggle competitions. If you have any questions, please feel free to ask them in the GitHub Discussions

Exploring Advanced Features with PyTorch Tabular

Data¶

Defining the Model¶

1. DataConfig¶

2. TrainingConfig¶

3. OptimizerConfig¶

4. ModelConfig¶

5. TabularModel¶