Using Model Sweep as an initial Model Selection Tool
Level: Intermediate
In this tutorial, we will look at an easy way to assess the performance different Deep Learning models in PyTorch Tabular on a dataset. Sort of a pycaret
style sweep of models. In PyTorch Tabular, we call this Model Sweep
.
Data¶
We will use the Covertype dataset from UCI ML Repository and split it into train and test. We can split into val as well, but even if we don't PyTorch Tabular will automatically do it for us out of the train set.
from pytorch_tabular.utils import load_covertype_dataset
from sklearn.model_selection import train_test_split
data, cat_col_names, num_col_names, target_col = load_covertype_dataset()
train, test = train_test_split(data, random_state=42, test_size=0.2)
print(f"Train Shape: {train.shape} | Test Shape: {test.shape}")
Defining the Config¶
As you saw in the basic tutorial, we need to define a set of configs. Even for model sweep, we need to define all configs except the ModelConfig
. We will keep most of it defaults, but set some congis to control the training process:
- Automatic Learning Rate Finding
- Batch Size
- Max Epochs
- Turning off Progress Bar and Model Summary so taht it won't clutter the output.
from pytorch_tabular.config import (
DataConfig,
OptimizerConfig,
TrainerConfig,
)
from pytorch_tabular.models.common.heads import LinearHeadConfig
data_config = DataConfig(
target=[target_col],
continuous_cols=num_col_names,
categorical_cols=cat_col_names,
)
trainer_config = TrainerConfig(
batch_size=1024,
max_epochs=25,
auto_lr_find=True,
early_stopping=None, # Monitor valid_loss for early stopping
# early_stopping_mode="min", # Set the mode as min because for val_loss, lower is better
# early_stopping_patience=5, # No. of epochs of degradation training will wait before terminating
checkpoints="valid_loss", # Save best checkpoint monitoring val_loss
load_best=True, # After training, load the best checkpoint
progress_bar="none", # Turning off Progress bar
trainer_kwargs=dict(enable_model_summary=False), # Turning off model summary
accelerator="cpu",
)
optimizer_config = OptimizerConfig()
head_config = LinearHeadConfig(
layers="",
dropout=0.1,
initialization=( # No additional layer in head, just a mapping layer to output_dim
"kaiming"
),
).__dict__ # Convert to dict to pass to the model config (OmegaConf doesn't accept objects)
Model Sweep¶
The model sweep enables you to quickly sweep thorugh different models and configurations. It takes in a list of model configs or one of the presets defined in pytorch_tabular.MODEL_PRESETS
and trains them on the data. It then ranks the models based on the metric provided and returns the best model.
These are the major arguments to the model_sweep
function:
- task
: The type of prediction task. Either 'classification' or 'regression'
- train
: The training data
- test
: The test data on which performance is evaluated
- Configs
: All the config objects can be passed as either the object or the path to the yaml file.
- model_list
: The list of models to compare. This can be one of the presets defined in pytorch_tabular.MODEL_SWEEP_PRESETS
or a list of ModelConfig
objects.
There are three presets defined in pytorch_tabular.MODEL_SWEEP_PRESETS
:
lite
: This is a set of models that are fast to train. This is the default value formodel_list
. The models and its hyperparameters parameters are carefully chosen such that they have comparable # of parameters, trains relatively faster, and gives good results. The models included are:
standard
: This is a set of models that have less than or around a 100 thousand learnable parameters so that it's still not high memory requirement. All the models from thelite
presets are also included. The models and its hyperparameters parameters are carefully chosen such that they have comparable # of parameters, and gives good results. The models included are:
full
: This is a full sweep of the models, with default hyperparameters, implemented in PyTorch Tabular, except for Mixed Density Networks (which is a specialized model for probabilistic regression) and NODE (which is a model which require high compute and memory). The models included are:
high_memory
: This is a full sweep of the models, with default hyperparameters, implemented in PyTorch Tabular, except for Mixed Density Networks (which is a specialized model for probabilistic regression). This option is only recommended if you have ample memory to hold the model and data in your CPU/GPU. The models included are:
metrics, metrics_params, metrics_prob_input
: The metrics to use for evaluation. These parameters hold the same meaning as in theModelConfig
.rank_metric
: This is the metric to use for ranking the models. This is a Tuple with the first element as the metric name and the second element is the direction (if it islower_the_better
orhgher_the_better
). Defaults to ('loss', "lower_is_better").return_best_model
: If True, will return the best model. Defaults to True.
Now let's try and run the sweep on the Covertype dataset, using the lite
preset.
%%time
from pytorch_tabular import model_sweep
import warnings
# Filtering out the warnings
with warnings.catch_warnings():
warnings.simplefilter("ignore")
sweep_df, best_model = model_sweep(
task="classification", # One of "classification", "regression"
train=train,
test=test,
data_config=data_config,
optimizer_config=optimizer_config,
trainer_config=trainer_config,
model_list="lite",
common_model_args=dict(head="LinearHead", head_config=head_config),
metrics=["accuracy", "f1_score"],
metrics_params=[{}, {"average": "macro"}],
metrics_prob_input=[False, True],
rank_metric=("accuracy", "higher_is_better"),
progress_bar=True,
verbose=False,
suppress_lightning_logger=True,
)
The output, sweep_df
is a pandas dataframe with the following columns:
- model
: The name of the model
- # Params
: The number of trainable parameters in the model
- test_loss
: The loss on the test set
- test_<metric>
: The metric value on the test set
- time_taken
: The time taken to train the model
- epochs
: The number of epochs trained
- time_taken_per_epoch
: The time taken per epoch
- params
: The config used to train the model
Let's check which model performed the best.
We have trained three fast models on the dataset in ~15 mins on CPU. That is pretty fast. We can see that the GANDALF model performed the best in terms of accuracy, loss and f1 score. We can also see that the training time is comparable to regular MLP. A natural next step would be to tune the model a bit more and find the best parameters.
Or, if you have more time, access to a decent size GPU, and want to try out more models, you can try the standard
preset. Even on a CPU, it may run for a couple of hours only. But it will give you a good idea of the performance of different models.
Let' try and run the standard
preset.
%%time
# Filtering out the warnings
with warnings.catch_warnings():
warnings.simplefilter("ignore")
sweep_df, best_model = model_sweep(
task="classification", # One of "classification", "regression"
train=train,
test=test,
data_config=data_config,
optimizer_config=optimizer_config,
trainer_config=trainer_config,
model_list="standard",
common_model_args=dict(head="LinearHead", head_config=head_config),
metrics=["accuracy", "f1_score"],
metrics_params=[{}, {"average": "macro"}],
metrics_prob_input=[False, True],
rank_metric=("accuracy", "higher_is_better"),
progress_bar=True,
verbose=False,
suppress_lightning_logger=True,
)
The larger GANDALF model performed the best in terms of accuracy, loss and f1 score. Although the training time is slightly higher than the comparable MLP, it is still pretty fast.
Now, apart from using the presets, you can also pass a list of ModelConfig
objects. Let's try and run a sweep with a list of ModelConfig
objects.
from pytorch_tabular.models import CategoryEmbeddingModelConfig, GANDALFConfig
common_params = {
"task": "classification",
"head":"LinearHead", "head_config":head_config
}
model_list = [
CategoryEmbeddingModelConfig(layers="1024-512-256", **common_params),
GANDALFConfig(gflu_stages=2, **common_params),
GANDALFConfig(gflu_stages=6, learnable_sparsity=False, **common_params),
]
# Filtering out the warnings
with warnings.catch_warnings():
warnings.simplefilter("ignore")
sweep_df, best_model = model_sweep(
task="classification", # One of "classification", "regression"
train=train,
test=test,
data_config=data_config,
optimizer_config=optimizer_config,
trainer_config=trainer_config,
model_list=model_list,
metrics=["accuracy", "f1_score"],
metrics_params=[{}, {"average": "macro"}],
metrics_prob_input=[False, True],
rank_metric=("accuracy", "higher_is_better"),
progress_bar=True,
verbose=False,
suppress_lightning_logger=True,
)
Although we chose some random hyperparameters, we can see that the GANDALF model performed very close to the MLP, at a fraction of the Parameters and lower training time.
Now try to use this in your own dataset. You can also try to use the `full` preset and see how it performs.