Search Best Architecture and Hyperparameter¶
Sometimes (or often) we do not know exactly which architecture is the best for our data. In artificial intelligence, it is common for an architecture to be the best for one dataset and not so good for another dataset. To try to help to find the best solution, this Notebook will use two main function in PyTorch Tabular. One of them is Sweep to run all architecture available in PyTorch Tabular with default hyperparameters to search for the possible best architecture for our data. Afterward, we will use Tuner to search for the best hyperparameter of the best architecture that we found in Sweep.
Data¶
First of all, let's create a synthetic data which is a mix of numerical and categorical features and have multiple targets for classification. It means that there are multiple columns which we need to predict with the same set of features.
Common Configs¶
data_config = DataConfig(
target=[
"target"
],
continuous_cols=num_col_names,
categorical_cols=cat_col_names,
)
trainer_config = TrainerConfig(
batch_size=32,
max_epochs=50,
early_stopping="valid_accuracy",
early_stopping_mode="max",
early_stopping_patience=3,
checkpoints="valid_accuracy",
load_best=True,
progress_bar="none"
)
optimizer_config = OptimizerConfig()
Model Sweep¶
https://pytorch-tabular.readthedocs.io/en/latest/apidocs_coreclasses/#pytorch_tabular.model_sweep
Let's train all available models ("high_memory"). If some of them return as "OOM" it means that you do not have enough memory to run in the current batch_size. You can ignore that model or reduce the batch_size in TrainerConfig.
In the following table, we can see the best models (with default hyperparameters) for our dataset. But we are not satisfied, so in this case we will take the top two models and use Tuner to find better hyperparameters and have a better result.
PS: Each time that run the Notebook the result may change a little, so you might see different top model that we will use in the next section.
Model Tuner¶
https://pytorch-tabular.readthedocs.io/en/latest/apidocs_coreclasses/#pytorch_tabular.TabularModelTuner
Perfect!! Now that we know the best models, let take the top two and play with their hyperparameters to try find better results.
We can use two main strategies: - grid_search: to search for all hyperparameters that were defined, but remember that each new fields that you add will considerably increase the total training time. If you configure 4 optimizers, 4 layes, 2 activations and 2 dropout, that means 64 (4 * 4 * 2 * 3) trainings. - random_search: will get randomly get "n_trials" hyperparameters settings from each model that has been defined. It is useful for faster training, but remember that will not test all hyperparameters.
For all hyperparameters options: https://pytorch-tabular.readthedocs.io/en/latest/apidocs_model/
More informations about how the hyperparameter spaces work: https://pytorch-tabular.readthedocs.io/en/latest/tutorials/10-Hyperparameter%20Tuning/#define-the-hyperparameter-space
Let's define some hyperparameters.
PS: This Notebook is to exemplify the functions and does not mean that are the best hyperparameters to try.
search_space_category_embedding = {
"optimizer_config__optimizer": ["Adam", "SGD"],
"model_config__layers": ["128-64-32", "1024-512-256", "32-64-128", "256-512-1024"],
"model_config__activation": ["ReLU", "LeakyReLU"],
"model_config__embedding_dropout": [0.0, 0.2],
}
model_config_category_embedding = CategoryEmbeddingModelConfig(task="classification")
search_space_ft_transformer = {
"optimizer_config__optimizer": ["Adam", "SGD"],
"model_config__input_embed_dim": [32, 64],
"model_config__num_attn_blocks": [3, 6, 8],
"model_config__ff_hidden_multiplier": [4, 8],
"model_config__transformer_activation": ["GEGLU", "LeakyReLU"],
"model_config__embedding_dropout": [0.0, 0.2],
}
model_config_ft_transformer = FTTransformerConfig(task="classification")
Let's add all search spaces and model configs in list.
Important They must be in the same order and same length
tuner = TabularModelTuner(
data_config=data_config,
model_config=model_configs,
optimizer_config=optimizer_config,
trainer_config=trainer_config
)
with warnings.catch_warnings():
warnings.simplefilter("ignore")
tuner_df = tuner.tune(
train=train,
validation=valid,
search_space=search_spaces,
strategy="grid_search", # random_search
# n_trials=5,
metric="accuracy",
mode="max",
progress_bar=True,
verbose=False # Make True if you want to log metrics and params each trial
)
Nice!!! We now know the best architecture and possible hyperparameters for our dataset. Maybe the result is not good enough, but at least will reduce the options. With these results, we will know better which are the best hyperparameters that can be better explored and others that do not make sense to continue using.
It is even a good idea to explore the architecture paper so that, who knows, it can guide you further towards the best hyperparameters.
After training, the best model will be saved in output variable as "best_model". So if you liked the result and wish to use the model in the future, you can save it calling "save_model".