Supervised Models

Configuration Classes¶

Bases: ModelConfig

AutomaticFeatureInteraction configuration.

Parameters:

Name	Type	Description	Default
`attn_embed_dim`	`int`	The number of hidden units in the Multi-Headed Attention layers. Defaults to 32	`32`
`num_heads`	`int`	The number of heads in the Multi-Headed Attention layer. Defaults to 2	`2`
`num_attn_blocks`	`int`	The number of layers of stacked Multi-Headed Attention layers. Defaults to 3	`3`
`attn_dropouts`	`float`	Dropout between layers of Multi-Headed Attention Layers. Defaults to 0.0	`0.0`
`has_residuals`	`bool`	Flag to have a residual connect from embedded output to attention layer output. Defaults to True	`True`
`embedding_dim`	`int`	The dimensions of the embedding for continuous and categorical columns. Defaults to 16	`16`
`embedding_initialization`	`Optional[str]`	Initialization scheme for the embedding layers. Defaults to `kaiming`. Choices are: [`kaiming_uniform`,`kaiming_normal`].	`'kaiming_uniform'`
`embedding_bias`	`bool`	Flag to turn on Embedding Bias. Defaults to True	`True`
`share_embedding`	`bool`	The flag turns on shared embeddings in the input embedding process. The key idea here is to have an embedding for the feature as a whole along with embeddings of each unique values of that column. For more details refer to Appendix A of the TabTransformer paper. Defaults to False	`False`
`share_embedding_strategy`	`Optional[str]`	There are two strategies in adding shared embeddings. 1. `add` - A separate embedding for the feature is added to the embedding of the unique values of the feature. 2. `fraction` - A fraction of the input embedding is reserved for the shared embedding of the feature. Defaults to fraction. Choices are: [`add`,`fraction`].	`'fraction'`
`shared_embedding_fraction`	`float`	Fraction of the input_embed_dim to be reserved by the shared embedding. Should be less than one. Defaults to 0.25	`0.25`
`deep_layers`	`bool`	Flag to enable a deep MLP layer before the Multi-Headed Attention layer. Defaults to False	`False`
`layers`	`str`	Hyphen-separated number of layers and units in the deep MLP. Defaults to 128-64-32	`'128-64-32'`
`activation`	`str`	The activation type in the deep MLP. The default activation in PyTorch like ReLU, TanH, LeakyReLU, etc. https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity. Defaults to ReLU	`'ReLU'`
`use_batch_norm`	`bool`	Flag to include a BatchNorm layer after each Linear Layer+DropOut in the deep MLP. Defaults to False	`False`
`initialization`	`str`	Initialization scheme for the linear layers in the deep MLP. Defaults to `kaiming`. Choices are: [`kaiming`,`xavier`,`random`].	`'kaiming'`
`dropout`	`float`	Probability of an element to be zeroed in the deep MLP. Defaults to 0.0	`0.0`
`attention_pooling`	`bool`	If True, will combine the attention outputs of each block for final prediction. Defaults to False	`False`
`task`	`str`	Specify whether the problem is regression or classification. `backbone` is a task which considers the model as a backbone to generate features. Mostly used internally for SSL and related tasks. Choices are: [`regression`,`classification`,`backbone`].	required
`head`	`Optional[str]`	The head to be used for the model. Should be one of the heads defined in `pytorch_tabular.models.common.heads`. Defaults to LinearHead. Choices are: [`None`,`LinearHead`,`MixtureDensityHead`].	`'LinearHead'`
`head_config`	`Optional[Dict]`	The config as a dict which defines the head. If left empty, will be initialized as default linear head.	`lambda: {'layers': ''}()`
`embedding_dims`	`Optional[List]`	The dimensions of the embedding for each categorical column as a list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of the categorical column using the rule min(50, (x + 1) // 2)	`None`
`embedding_dropout`	`float`	Dropout to be applied to the Categorical Embedding. Defaults to 0.0	`0.0`
`batch_norm_continuous_input`	`bool`	If True, we will normalize the continuous layer by passing it through a BatchNorm layer.	`True`
`learning_rate`	`float`	The learning rate of the model. Defaults to 1e-3.	`0.001`
`loss`	`Optional[str]`	The loss function to be applied. By Default, it is MSELoss for regression and CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss or L1Loss for regression and CrossEntropyLoss for classification	`None`
`metrics`	`Optional[List[str]]`	the list of metrics you need to track during training. The metrics should be one of the functional metrics implemented in `torchmetrics`. By default, it is accuracy if classification and mean_squared_error for regression	`None`
`metrics_params`	`Optional[List]`	The parameters to be passed to the metrics function	`None`
`metrics_prob_input`	`Optional[List]`	Is a mandatory parameter for classification metrics defined in the config. This defines whether the input to the metric function is the probability or the class. Length should be same as the number of metrics. Defaults to None.	`None`
`target_range`	`Optional[List]`	The range in which we should limit the output variable. Currently ignored for multi-target regression. Typically used for Regression problems. If left empty, will not apply any restrictions	`None`
`seed`	`int`	The seed for reproducibility. Defaults to 42	`42`

Source code in src/pytorch_tabular/models/autoint/config.py

@dataclass
class AutoIntConfig(ModelConfig):
    """AutomaticFeatureInteraction configuration.

    Args:
        attn_embed_dim (int): The number of hidden units in the Multi-Headed Attention layers. Defaults to
                32

        num_heads (int): The number of heads in the Multi-Headed Attention layer. Defaults to 2

        num_attn_blocks (int): The number of layers of stacked Multi-Headed Attention layers. Defaults to 3

        attn_dropouts (float): Dropout between layers of Multi-Headed Attention Layers. Defaults to 0.0

        has_residuals (bool): Flag to have a residual connect from embedded output to attention layer
                output. Defaults to True

        embedding_dim (int): The dimensions of the embedding for continuous and categorical columns.
                Defaults to 16

        embedding_initialization (Optional[str]): Initialization scheme for the embedding layers. Defaults
                to `kaiming`. Choices are: [`kaiming_uniform`,`kaiming_normal`].

        embedding_bias (bool): Flag to turn on Embedding Bias. Defaults to True

        share_embedding (bool): The flag turns on shared embeddings in the input embedding process. The key
                idea here is to have an embedding for the feature as a whole along with embeddings of each unique
                values of that column. For more details refer to Appendix A of the TabTransformer paper. Defaults
                to False

        share_embedding_strategy (Optional[str]): There are two strategies in adding shared embeddings. 1.
                `add` - A separate embedding for the feature is added to the embedding of the unique values of the
                feature. 2. `fraction` - A fraction of the input embedding is reserved for the shared embedding of
                the feature. Defaults to fraction. Choices are: [`add`,`fraction`].

        shared_embedding_fraction (float): Fraction of the input_embed_dim to be reserved by the shared
                embedding. Should be less than one. Defaults to 0.25

        deep_layers (bool): Flag to enable a deep MLP layer before the Multi-Headed Attention layer.
                Defaults to False

        layers (str): Hyphen-separated number of layers and units in the deep MLP. Defaults to 128-64-32

        activation (str): The activation type in the deep MLP. The default activation in PyTorch like ReLU,
                TanH, LeakyReLU, etc.
                https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity.
                Defaults to ReLU

        use_batch_norm (bool): Flag to include a BatchNorm layer after each Linear Layer+DropOut in the
                deep MLP. Defaults to False

        initialization (str): Initialization scheme for the linear layers in the deep MLP. Defaults to
                `kaiming`. Choices are: [`kaiming`,`xavier`,`random`].

        dropout (float): Probability of an element to be zeroed in the deep MLP. Defaults to 0.0

        attention_pooling (bool): If True, will combine the attention outputs of each block for final
                prediction. Defaults to False

        task (str): Specify whether the problem is regression or classification. `backbone` is a task which
                considers the model as a backbone to generate features. Mostly used internally for SSL and related
                tasks. Choices are: [`regression`,`classification`,`backbone`].

        head (Optional[str]): The head to be used for the model. Should be one of the heads defined in
                `pytorch_tabular.models.common.heads`. Defaults to  LinearHead. Choices are:
                [`None`,`LinearHead`,`MixtureDensityHead`].

        head_config (Optional[Dict]): The config as a dict which defines the head. If left empty, will be
                initialized as default linear head.

        embedding_dims (Optional[List]): The dimensions of the embedding for each categorical column as a
                list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of
                the categorical column using the rule min(50, (x + 1) // 2)

        embedding_dropout (float): Dropout to be applied to the Categorical Embedding. Defaults to 0.0

        batch_norm_continuous_input (bool): If True, we will normalize the continuous layer by passing it
                through a BatchNorm layer.

        learning_rate (float): The learning rate of the model. Defaults to 1e-3.

        loss (Optional[str]): The loss function to be applied. By Default, it is MSELoss for regression and
                CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss
                or L1Loss for regression and CrossEntropyLoss for classification

        metrics (Optional[List[str]]): the list of metrics you need to track during training. The metrics
                should be one of the functional metrics implemented in ``torchmetrics``. By default, it is
                accuracy if classification and mean_squared_error for regression

        metrics_params (Optional[List]): The parameters to be passed to the metrics function

        metrics_prob_input (Optional[List]): Is a mandatory parameter for classification metrics defined in the config.
            This defines whether the input to the metric function is the probability or the class. Length should be
            same as the number of metrics. Defaults to None.

        target_range (Optional[List]): The range in which we should limit the output variable. Currently
                ignored for multi-target regression. Typically used for Regression problems. If left empty, will
                not apply any restrictions

        seed (int): The seed for reproducibility. Defaults to 42

    """

    attn_embed_dim: int = field(
        default=32,
        metadata={"help": "The number of hidden units in the Multi-Headed Attention layers. Defaults to 32"},
    )
    num_heads: int = field(
        default=2,
        metadata={"help": "The number of heads in the Multi-Headed Attention layer. Defaults to 2"},
    )
    num_attn_blocks: int = field(
        default=3,
        metadata={"help": "The number of layers of stacked Multi-Headed Attention layers. Defaults to 3"},
    )
    attn_dropouts: float = field(
        default=0.0,
        metadata={"help": "Dropout between layers of Multi-Headed Attention Layers. Defaults to 0.0"},
    )
    has_residuals: bool = field(
        default=True,
        metadata={
            "help": "Flag to have a residual connect from enbedded output to attention layer output. Defaults to True"
        },
    )
    embedding_dim: int = field(
        default=16,
        metadata={"help": "The dimensions of the embedding for continuous and categorical columns. Defaults to 16"},
    )
    embedding_initialization: Optional[str] = field(
        default="kaiming_uniform",
        metadata={
            "help": "Initialization scheme for the embedding layers. Defaults to `kaiming`",
            "choices": ["kaiming_uniform", "kaiming_normal"],
        },
    )
    embedding_bias: bool = field(
        default=True,
        metadata={"help": "Flag to turn on Embedding Bias. Defaults to True"},
    )
    share_embedding: bool = field(
        default=False,
        metadata={
            "help": "The flag turns on shared embeddings in the input embedding process."
            " The key idea here is to have an embedding for the feature as a whole along with embeddings"
            " of each unique values of that column."
            " For more details refer to Appendix A of the TabTransformer paper. Defaults to False"
        },
    )
    share_embedding_strategy: Optional[str] = field(
        default="fraction",
        metadata={
            "help": "There are two strategies in adding shared embeddings."
            " 1. `add` - A separate embedding for the feature is added to the embedding"
            " of the unique values of the feature."
            " 2. `fraction` - A fraction of the input embedding is reserved"
            " for the shared embedding of the feature. Defaults to fraction.",
            "choices": ["add", "fraction"],
        },
    )
    shared_embedding_fraction: float = field(
        default=0.25,
        metadata={
            "help": "Fraction of the input_embed_dim to be reserved by the shared embedding."
            " Should be less than one. Defaults to 0.25"
        },
    )
    deep_layers: bool = field(
        default=False,
        metadata={"help": "Flag to enable a deep MLP layer before the Multi-Headed Attention layer. Defaults to False"},
    )
    layers: str = field(
        default="128-64-32",
        metadata={"help": "Hyphen-separated number of layers and units in the deep MLP. Defaults to 128-64-32"},
    )
    activation: str = field(
        default="ReLU",
        metadata={
            "help": "The activation type in the deep MLP. The default activation in PyTorch"
            " like ReLU, TanH, LeakyReLU, etc."
            " https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity."
            " Defaults to ReLU"
        },
    )
    use_batch_norm: bool = field(
        default=False,
        metadata={
            "help": "Flag to include a BatchNorm layer after each Linear Layer+DropOut in the deep MLP."
            " Defaults to False"
        },
    )
    initialization: str = field(
        default="kaiming",
        metadata={
            "help": "Initialization scheme for the linear layers in the deep MLP. Defaults to `kaiming`",
            "choices": ["kaiming", "xavier", "random"],
        },
    )
    dropout: float = field(
        default=0.0,
        metadata={"help": "Probability of an element to be zeroed in the deep MLP. Defaults to 0.0"},
    )
    attention_pooling: bool = field(
        default=False,
        metadata={
            "help": "If True, will combine the attention outputs of each block for final prediction. Defaults to False"
        },
    )
    _module_src: str = field(default="models.autoint")
    _model_name: str = field(default="AutoIntModel")
    _backbone_name: str = field(default="AutoIntBackbone")
    _config_name: str = field(default="AutoIntConfig")

Bases: ModelConfig

CategoryEmbeddingModel configuration.

Parameters:

Name	Type	Description	Default
`layers`	`str`	DEPRECATED: Hyphen-separated number of layers and units in the classification head. E.g. 32-64-32. Defaults to 128-64-32	`'128-64-32'`
`activation`	`str`	DEPRECATED: The activation type in the classification head. The default activation in PyTorch like ReLU, TanH, LeakyReLU, etc. https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity. Defaults to ReLU	`'ReLU'`
`use_batch_norm`	`bool`	DEPRECATED: Flag to include a BatchNorm layer after each Linear Layer+DropOut. Defaults to False	`False`
`initialization`	`str`	DEPRECATED: Initialization scheme for the linear layers. Defaults to `kaiming`. Choices are: [`kaiming`,`xavier`,`random`].	`'kaiming'`
`dropout`	`float`	DEPRECATED: probability of a classification element to be zeroed. This is added to each linear layer. Defaults to 0.0	`0.0`
`task`	`str`	Specify whether the problem is regression or classification. `backbone` is a task which considers the model as a backbone to generate features. Mostly used internally for SSL and related tasks. Choices are: [`regression`,`classification`,`backbone`].	required
`head`	`Optional[str]`	The head to be used for the model. Should be one of the heads defined in `pytorch_tabular.models.common.heads`. Defaults to LinearHead. Choices are: [`None`,`LinearHead`,`MixtureDensityHead`].	`'LinearHead'`
`head_config`	`Optional[Dict]`	The config as a dict which defines the head. If left empty, will be initialized as default linear head.	`lambda: {'layers': ''}()`
`embedding_dims`	`Optional[List]`	The dimensions of the embedding for each categorical column as a list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of the categorical column using the rule min(50, (x + 1) // 2)	`None`
`embedding_dropout`	`float`	Dropout to be applied to the Categorical Embedding. Defaults to 0.0	`0.0`
`batch_norm_continuous_input`	`bool`	If True, we will normalize the continuous layer by passing it through a BatchNorm layer.	`True`
`learning_rate`	`float`	The learning rate of the model. Defaults to 1e-3.	`0.001`
`loss`	`Optional[str]`	The loss function to be applied. By Default, it is MSELoss for regression and CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss or L1Loss for regression and CrossEntropyLoss for classification	`None`
`metrics`	`Optional[List[str]]`	the list of metrics you need to track during training. The metrics should be one of the functional metrics implemented in `torchmetrics`. By default, it is accuracy if classification and mean_squared_error for regression	`None`
`metrics_params`	`Optional[List]`	The parameters to be passed to the metrics function. `task` is forced to be `multiclass` because the multiclass version can handle binary as well and for simplicity we are only using `multiclass`.	`None`
`metrics_prob_input`	`Optional[List]`	Is a mandatory parameter for classification metrics defined in the config. This defines whether the input to the metric function is the probability or the class. Length should be same as the number of metrics. Defaults to None.	`None`
`target_range`	`Optional[List]`	The range in which we should limit the output variable. Currently ignored for multi-target regression. Typically used for Regression problems. If left empty, will not apply any restrictions	`None`
`seed`	`int`	The seed for reproducibility. Defaults to 42	`42`

Source code in src/pytorch_tabular/models/category_embedding/config.py

@dataclass
class CategoryEmbeddingModelConfig(ModelConfig):
    """CategoryEmbeddingModel configuration.

    Args:
        layers (str): DEPRECATED: Hyphen-separated number of layers and units in the classification head. E.g. 32-64-32.
                Defaults to 128-64-32

        activation (str): DEPRECATED: The activation type in the classification head. The default activation in PyTorch
                like ReLU, TanH, LeakyReLU, etc. https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity.
                Defaults to ReLU

        use_batch_norm (bool): DEPRECATED: Flag to include a BatchNorm layer after each Linear Layer+DropOut. Defaults
                to False

        initialization (str): DEPRECATED: Initialization scheme for the linear layers. Defaults to `kaiming`. Choices
                are: [`kaiming`,`xavier`,`random`].

        dropout (float): DEPRECATED: probability of a classification element to be zeroed. This is added to each
                linear layer. Defaults to 0.0


        task (str): Specify whether the problem is regression or classification. `backbone` is a task which
                considers the model as a backbone to generate features. Mostly used internally for SSL and related
                tasks. Choices are: [`regression`,`classification`,`backbone`].

        head (Optional[str]): The head to be used for the model. Should be one of the heads defined in
                `pytorch_tabular.models.common.heads`. Defaults to  LinearHead. Choices are:
                [`None`,`LinearHead`,`MixtureDensityHead`].

        head_config (Optional[Dict]): The config as a dict which defines the head. If left empty, will be
                initialized as default linear head.

        embedding_dims (Optional[List]): The dimensions of the embedding for each categorical column as a
                list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of
                the categorical column using the rule min(50, (x + 1) // 2)

        embedding_dropout (float): Dropout to be applied to the Categorical Embedding. Defaults to 0.0

        batch_norm_continuous_input (bool): If True, we will normalize the continuous layer by passing it
                through a BatchNorm layer.

        learning_rate (float): The learning rate of the model. Defaults to 1e-3.

        loss (Optional[str]): The loss function to be applied. By Default, it is MSELoss for regression and
                CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss
                or L1Loss for regression and CrossEntropyLoss for classification

        metrics (Optional[List[str]]): the list of metrics you need to track during training. The metrics
                should be one of the functional metrics implemented in ``torchmetrics``. By default, it is
                accuracy if classification and mean_squared_error for regression

        metrics_params (Optional[List]): The parameters to be passed to the metrics function. `task` is forced to
                be `multiclass` because the multiclass version can handle binary as well and for simplicity we are
                only using `multiclass`.

        metrics_prob_input (Optional[List]): Is a mandatory parameter for classification metrics defined in the config.
            This defines whether the input to the metric function is the probability or the class. Length should be
            same as the number of metrics. Defaults to None.

        target_range (Optional[List]): The range in which we should limit the output variable. Currently
                ignored for multi-target regression. Typically used for Regression problems. If left empty, will
                not apply any restrictions

        seed (int): The seed for reproducibility. Defaults to 42

    """

    layers: str = field(
        default="128-64-32",
        metadata={
            "help": (
                "Hyphen-separated number of layers and units in the classification"
                " head. eg. 32-64-32. Defaults to 128-64-32"
            )
        },
    )
    activation: str = field(
        default="ReLU",
        metadata={
            "help": (
                "The activation type in the classification head. The default"
                " activation in PyTorch like ReLU, TanH, LeakyReLU, etc."
                " https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity."
                " Defaults to ReLU"
            )
        },
    )
    use_batch_norm: bool = field(
        default=False,
        metadata={"help": ("Flag to include a BatchNorm layer after each Linear Layer+DropOut." " Defaults to False")},
    )
    initialization: str = field(
        default="kaiming",
        metadata={
            "help": ("Initialization scheme for the linear layers. Defaults to `kaiming`"),
            "choices": ["kaiming", "xavier", "random"],
        },
    )
    dropout: float = field(
        default=0.0,
        metadata={
            "help": (
                "probability of an classification element to be zeroed."
                " This is added to each linear layer. Defaults to 0.0"
            )
        },
    )

    # def __post_init__(self):
    #     deprecated_args = [
    #         "layers",
    #         "activation",
    #         "use_batch_norm",
    #         "initialization",
    #         "dropout",
    #     ]
    #     # for arg in deprecated_args:
    #     if any([getattr(self, arg) is not None for arg in deprecated_args]):
    #         warnings.warn(
    #             f"{deprecated_args} are deprecated and will be remoevd in next version. "
    #             "Please use 'head' and `head_config` and set deprecated args "
    #             "to `None` to turn off warning. CategoricalEmbedding model is just a "
    #             "linear head with embedding layers."
    #         )
    #     return super().__post_init__()

    _module_src: str = field(default="models.category_embedding")
    _model_name: str = field(default="CategoryEmbeddingModel")
    _backbone_name: str = field(default="CategoryEmbeddingBackbone")
    _config_name: str = field(default="CategoryEmbeddingModelConfig")

Bases: ModelConfig

DANet configuration.

Parameters:

Name	Type	Description	Default
`n_layers`	`int`	Number of Blocks in the DANet. 8, 20, 32 are configurations the paper evaluated. Defaults to 8	`8`
`abstlay_dim_1`	`int`	The dimension for the intermediate output in the first ABSTLAY layer in a Block. Defaults to 32	`32`
`abstlay_dim_2`	`int`	The dimension for the intermediate output in the second ABSTLAY layer in a Block. Defaults to 64	`None`
`k`	`int`	The number of feature groups in the ABSTLAY layer. Defaults to 5	`5`
`dropout_rate`	`float`	Dropout to be applied in the Block. Defaults to 0.1	`0.1`
`task`	`str`	Specify whether the problem is regression or classification. `backbone` is a task which considers the model as a backbone to generate features. Mostly used internally for SSL and related tasks. Choices are: [`regression`,`classification`,`backbone`].	required
`head`	`Optional[str]`	The head to be used for the model. Should be one of the heads defined in `pytorch_tabular.models.common.heads`. Defaults to LinearHead. Choices are: [`None`,`LinearHead`,`MixtureDensityHead`].	`'LinearHead'`
`head_config`	`Optional[Dict]`	The config as a dict which defines the head. If left empty, will be initialized as default linear head.	`lambda: {'layers': ''}()`
`embedding_dims`	`Optional[List]`	The dimensions of the embedding for each categorical column as a list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of the categorical column using the rule min(50, (x + 1) // 2)	`None`
`embedding_dropout`	`float`	Dropout to be applied to the Categorical Embedding. Defaults to 0.0	`0.0`
`batch_norm_continuous_input`	`bool`	If True, we will normalize the continuous layer by passing it through a BatchNorm layer.	`True`
`learning_rate`	`float`	The learning rate of the model. Defaults to 1e-3.	`0.001`
`loss`	`Optional[str]`	The loss function to be applied. By Default, it is MSELoss for regression and CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss or L1Loss for regression and CrossEntropyLoss for classification	`None`
`metrics`	`Optional[List[str]]`	the list of metrics you need to track during training. The metrics should be one of the functional metrics implemented in `torchmetrics`. By default, it is accuracy if classification and mean_squared_error for regression	`None`
`metrics_params`	`Optional[List]`	The parameters to be passed to the metrics function. `task` is forced to be `multiclass` because the multiclass version can handle binary as well and for simplicity we are only using `multiclass`.	`None`
`metrics_prob_input`	`Optional[List]`	Is a mandatory parameter for classification metrics defined in the config. This defines whether the input to the metric function is the probability or the class. Length should be same as the number of metrics. Defaults to None.	`None`
`target_range`	`Optional[List]`	The range in which we should limit the output variable. Currently ignored for multi-target regression. Typically used for Regression problems. If left empty, will not apply any restrictions	`None`
`seed`	`int`	The seed for reproducibility. Defaults to 42	`42`

Source code in src/pytorch_tabular/models/danet/config.py

@dataclass
class DANetConfig(ModelConfig):
    """DANet configuration.

    Args:
        n_layers (int): Number of Blocks in the DANet. 8, 20, 32 are configurations
            the paper evaluated. Defaults to 8

        abstlay_dim_1 (int): The dimension for the intermediate output in the
                first ABSTLAY layer in a Block. Defaults to 32

        abstlay_dim_2 (int): The dimension for the intermediate output in the
                second ABSTLAY layer in a Block. Defaults to 64

        k (int): The number of feature groups in the ABSTLAY layer. Defaults to 5

        dropout_rate (float): Dropout to be applied in the Block. Defaults to 0.1

        task (str): Specify whether the problem is regression or classification. `backbone` is a task which
                considers the model as a backbone to generate features. Mostly used internally for SSL and related
                tasks. Choices are: [`regression`,`classification`,`backbone`].

        head (Optional[str]): The head to be used for the model. Should be one of the heads defined in
                `pytorch_tabular.models.common.heads`. Defaults to  LinearHead. Choices are:
                [`None`,`LinearHead`,`MixtureDensityHead`].

        head_config (Optional[Dict]): The config as a dict which defines the head. If left empty, will be
                initialized as default linear head.

        embedding_dims (Optional[List]): The dimensions of the embedding for each categorical column as a
                list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of
                the categorical column using the rule min(50, (x + 1) // 2)

        embedding_dropout (float): Dropout to be applied to the Categorical Embedding. Defaults to 0.0

        batch_norm_continuous_input (bool): If True, we will normalize the continuous layer by passing it
                through a BatchNorm layer.

        learning_rate (float): The learning rate of the model. Defaults to 1e-3.

        loss (Optional[str]): The loss function to be applied. By Default, it is MSELoss for regression and
                CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss
                or L1Loss for regression and CrossEntropyLoss for classification

        metrics (Optional[List[str]]): the list of metrics you need to track during training. The metrics
                should be one of the functional metrics implemented in ``torchmetrics``. By default, it is
                accuracy if classification and mean_squared_error for regression

        metrics_params (Optional[List]): The parameters to be passed to the metrics function. `task` is forced to
                be `multiclass` because the multiclass version can handle binary as well and for simplicity we are
                only using `multiclass`.

        metrics_prob_input (Optional[List]): Is a mandatory parameter for classification metrics defined in the config.
            This defines whether the input to the metric function is the probability or the class. Length should be
            same as the number of metrics. Defaults to None.

        target_range (Optional[List]): The range in which we should limit the output variable. Currently
                ignored for multi-target regression. Typically used for Regression problems. If left empty, will
                not apply any restrictions

        seed (int): The seed for reproducibility. Defaults to 42

    """

    n_layers: int = field(
        default=8,
        metadata={"help": "Number of Blocks in the DANet. Each block has 2 Abstlay Blocks each. Defaults to 8"},
    )

    abstlay_dim_1: int = field(
        default=32,
        metadata={
            "help": "The dimension for the intermediate output in the first ABSTLAY layer in a Block. Defaults to 32"
        },
    )

    abstlay_dim_2: Optional[int] = field(
        default=None,
        metadata={
            "help": "The dimension for the intermediate output in the second ABSTLAY layer in a Block."
            "If None, it will be twice abstlay_dim_1. Defaults to None"
        },
    )
    k: int = field(
        default=5,
        metadata={"help": "The number of feature groups in the ABSTLAY layer. Defaults to 5"},
    )
    dropout_rate: float = field(
        default=0.1,
        metadata={"help": "Dropout to be applied in the Block. Defaults to 0.1"},
    )
    block_activation: str = field(
        default="LeakyReLU",
        metadata={
            "help": "The activation type in the classification head. The default activation in PyTorch"
            " like ReLU, TanH, LeakyReLU, etc."
            " https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity"
        },
    )
    virtual_batch_size: Optional[int] = field(
        default=256,
        metadata={
            "help": "If not None, all BatchNorms will be converted to GhostBatchNorm's "
            " with this virtual batch size. Defaults to None"
        },
    )

    _module_src: str = field(default="models.danet")
    _model_name: str = field(default="DANetModel")
    _backbone_name: str = field(default="DANetBackbone")
    _config_name: str = field(default="DANetConfig")

    def __post_init__(self):
        if self.abstlay_dim_2 is None:
            self.abstlay_dim_2 = self.abstlay_dim_1 * 2
        return super().__post_init__()

Bases: ModelConfig

Tab Transformer configuration.

Parameters:

Name	Type	Description	Default
`input_embed_dim`	`int`	The embedding dimension for the input categorical features. Defaults to 32	`32`
`embedding_initialization`	`Optional[str]`	Initialization scheme for the embedding layers. Defaults to `kaiming`. Choices are: [`kaiming_uniform`,`kaiming_normal`].	`'kaiming_uniform'`
`embedding_bias`	`bool`	Flag to turn on Embedding Bias. Defaults to True	`True`
`share_embedding`	`bool`	The flag turns on shared embeddings in the input embedding process. The key idea here is to have an embedding for the feature as a whole along with embeddings of each unique values of that column. For more details refer to Appendix A of the TabTransformer paper. Defaults to False	`False`
`share_embedding_strategy`	`Optional[str]`	There are two strategies in adding shared embeddings. 1. `add` - A separate embedding for the feature is added to the embedding of the unique values of the feature. 2. `fraction` - A fraction of the input embedding is reserved for the shared embedding of the feature. Defaults to fraction. Choices are: [`add`,`fraction`].	`'fraction'`
`shared_embedding_fraction`	`float`	Fraction of the input_embed_dim to be reserved by the shared embedding. Should be less than one. Defaults to 0.25	`0.25`
`attn_feature_importance`	`bool`	If you are facing memory issues, you can turn off feature importance which will not save the attention weights. Defaults to True	`True`
`num_heads`	`int`	The number of heads in the Multi-Headed Attention layer. Defaults to 8	`8`
`num_attn_blocks`	`int`	The number of layers of stacked Multi-Headed Attention layers. Defaults to 6	`6`
`transformer_head_dim`	`Optional[int]`	The number of hidden units in the Multi-Headed Attention layers. Defaults to None and will be same as input_dim.	`None`
`attn_dropout`	`float`	Dropout to be applied after Multi headed Attention. Defaults to 0.1	`0.1`
`add_norm_dropout`	`float`	Dropout to be applied in the AddNorm Layer. Defaults to 0.1	`0.1`
`ff_dropout`	`float`	Dropout to be applied in the Positionwise FeedForward Network. Defaults to 0.1	`0.1`
`ff_hidden_multiplier`	`int`	Multiple by which the Positionwise FF layer scales the input. Defaults to 4	`4`
`transformer_activation`	`str`	The activation type in the transformer feed forward layers. In addition to the default activation in PyTorch like ReLU, TanH, LeakyReLU, etc. https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity, GEGLU, ReGLU and SwiGLU are also implemented(https://arxiv.org/pdf/2002.05202.pdf). Defaults to GEGLU	`'GEGLU'`
`task`	`str`	Specify whether the problem is regression or classification. `backbone` is a task which considers the model as a backbone to generate features. Mostly used internally for SSL and related tasks.. Choices are: [`regression`,`classification`,`backbone`].	required
`head`	`Optional[str]`	The head to be used for the model. Should be one of the heads defined in `pytorch_tabular.models.common.heads`. Defaults to LinearHead. Choices are: [`None`,`LinearHead`,`MixtureDensityHead`].	`'LinearHead'`
`head_config`	`Optional[Dict]`	The config as a dict which defines the head. If left empty, will be initialized as default linear head.	`lambda: {'layers': ''}()`
`embedding_dims`	`Optional[List]`	The dimensions of the embedding for each categorical column as a list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of the categorical column using the rule min(50, (x + 1) // 2)	`None`
`embedding_dropout`	`float`	Dropout to be applied to the Categorical Embedding. Defaults to 0.0	`0.0`
`batch_norm_continuous_input`	`bool`	If True, we will normalize the continuous layer by passing it through a BatchNorm layer.	`True`
`learning_rate`	`float`	The learning rate of the model. Defaults to 1e-3.	`0.001`
`loss`	`Optional[str]`	The loss function to be applied. By Default, it is MSELoss for regression and CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss or L1Loss for regression and CrossEntropyLoss for classification	`None`
`metrics`	`Optional[List[str]]`	the list of metrics you need to track during training. The metrics should be one of the functional metrics implemented in `torchmetrics`. By default, it is accuracy if classification and mean_squared_error for regression	`None`
`metrics_params`	`Optional[List]`	The parameters to be passed to the metrics function. `task` is forced to be `multiclass` because the multiclass version can handle binary as well and for simplicity we are only using `multiclass`.	`None`
`metrics_prob_input`	`Optional[List]`	Is a mandatory parameter for classification metrics defined in the config. This defines whether the input to the metric function is the probability or the class. Length should be same as the number of metrics. Defaults to None.	`None`
`target_range`	`Optional[List]`	The range in which we should limit the output variable. Currently ignored for multi-target regression. Typically used for Regression problems. If left empty, will not apply any restrictions	`None`
`seed`	`int`	The seed for reproducibility. Defaults to 42	`42`

Source code in src/pytorch_tabular/models/ft_transformer/config.py

@dataclass
class FTTransformerConfig(ModelConfig):
    """Tab Transformer configuration.

    Args:
        input_embed_dim (int): The embedding dimension for the input categorical features. Defaults to 32

        embedding_initialization (Optional[str]): Initialization scheme for the embedding layers. Defaults
                to `kaiming`. Choices are: [`kaiming_uniform`,`kaiming_normal`].

        embedding_bias (bool): Flag to turn on Embedding Bias. Defaults to True

        share_embedding (bool): The flag turns on shared embeddings in the input embedding process. The key
                idea here is to have an embedding for the feature as a whole along with embeddings of each unique
                values of that column. For more details refer to Appendix A of the TabTransformer paper. Defaults
                to False

        share_embedding_strategy (Optional[str]): There are two strategies in adding shared embeddings. 1.
                `add` - A separate embedding for the feature is added to the embedding of the unique values of the
                feature. 2. `fraction` - A fraction of the input embedding is reserved for the shared embedding of
                the feature. Defaults to fraction. Choices are: [`add`,`fraction`].

        shared_embedding_fraction (float): Fraction of the input_embed_dim to be reserved by the shared
                embedding. Should be less than one. Defaults to 0.25

        attn_feature_importance (bool): If you are facing memory issues, you can turn off feature
                importance which will not save the attention weights. Defaults to True

        num_heads (int): The number of heads in the Multi-Headed Attention layer. Defaults to 8

        num_attn_blocks (int): The number of layers of stacked Multi-Headed Attention layers. Defaults to 6

        transformer_head_dim (Optional[int]): The number of hidden units in the Multi-Headed Attention
                layers. Defaults to None and will be same as input_dim.

        attn_dropout (float): Dropout to be applied after Multi headed Attention. Defaults to 0.1

        add_norm_dropout (float): Dropout to be applied in the AddNorm Layer. Defaults to 0.1

        ff_dropout (float): Dropout to be applied in the Positionwise FeedForward Network. Defaults to 0.1

        ff_hidden_multiplier (int): Multiple by which the Positionwise FF layer scales the input. Defaults
                to 4

        transformer_activation (str): The activation type in the transformer feed forward layers. In
                addition to the default activation in PyTorch like ReLU, TanH, LeakyReLU, etc.
                https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity, GEGLU,
                ReGLU and SwiGLU are also implemented(https://arxiv.org/pdf/2002.05202.pdf). Defaults to GEGLU

        task (str): Specify whether the problem is regression or classification. `backbone` is a task which
                considers the model as a backbone to generate features. Mostly used internally for SSL and related
                tasks.. Choices are: [`regression`,`classification`,`backbone`].

        head (Optional[str]): The head to be used for the model. Should be one of the heads defined in
                `pytorch_tabular.models.common.heads`. Defaults to  LinearHead. Choices are:
                [`None`,`LinearHead`,`MixtureDensityHead`].

        head_config (Optional[Dict]): The config as a dict which defines the head. If left empty, will be
                initialized as default linear head.

        embedding_dims (Optional[List]): The dimensions of the embedding for each categorical column as a
                list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of
                the categorical column using the rule min(50, (x + 1) // 2)

        embedding_dropout (float): Dropout to be applied to the Categorical Embedding. Defaults to 0.0

        batch_norm_continuous_input (bool): If True, we will normalize the continuous layer by passing it
                through a BatchNorm layer.

        learning_rate (float): The learning rate of the model. Defaults to 1e-3.

        loss (Optional[str]): The loss function to be applied. By Default, it is MSELoss for regression and
                CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss
                or L1Loss for regression and CrossEntropyLoss for classification

        metrics (Optional[List[str]]): the list of metrics you need to track during training. The metrics
                should be one of the functional metrics implemented in ``torchmetrics``. By default, it is
                accuracy if classification and mean_squared_error for regression

        metrics_params (Optional[List]): The parameters to be passed to the metrics function. `task` is forced to
                be `multiclass` because the multiclass version can handle binary as well and for simplicity we are
                only using `multiclass`.

        metrics_prob_input (Optional[List]): Is a mandatory parameter for classification metrics defined in the config.
            This defines whether the input to the metric function is the probability or the class. Length should be
            same as the number of metrics. Defaults to None.

        target_range (Optional[List]): The range in which we should limit the output variable. Currently
                ignored for multi-target regression. Typically used for Regression problems. If left empty, will
                not apply any restrictions

        seed (int): The seed for reproducibility. Defaults to 42

    """

    input_embed_dim: int = field(
        default=32,
        metadata={"help": "The embedding dimension for the input categorical features. Defaults to 32"},
    )
    embedding_initialization: Optional[str] = field(
        default="kaiming_uniform",
        metadata={
            "help": "Initialization scheme for the embedding layers. Defaults to `kaiming`",
            "choices": ["kaiming_uniform", "kaiming_normal"],
        },
    )
    embedding_bias: bool = field(
        default=True,
        metadata={"help": "Flag to turn on Embedding Bias. Defaults to True"},
    )
    share_embedding: bool = field(
        default=False,
        metadata={
            "help": "The flag turns on shared embeddings in the input embedding process."
            " The key idea here is to have an embedding for the feature as a whole along with embeddings"
            " of each unique values of that column. For more details refer"
            " to Appendix A of the TabTransformer paper. Defaults to False"
        },
    )
    share_embedding_strategy: Optional[str] = field(
        default="fraction",
        metadata={
            "help": "There are two strategies in adding shared embeddings."
            " 1. `add` - A separate embedding for the feature is added to the embedding"
            " of the unique values of the feature."
            " 2. `fraction` - A fraction of the input embedding is reserved"
            " for the shared embedding of the feature. Defaults to fraction.",
            "choices": ["add", "fraction"],
        },
    )
    shared_embedding_fraction: float = field(
        default=0.25,
        metadata={
            "help": "Fraction of the input_embed_dim to be reserved by the shared embedding."
            " Should be less than one. Defaults to 0.25"
        },
    )
    attn_feature_importance: bool = field(
        default=True,
        metadata={
            "help": "If you are facing memory issues, you can turn off feature importance"
            " which will not save the attention weights. Defaults to True"
        },
    )
    num_heads: int = field(
        default=8,
        metadata={"help": "The number of heads in the Multi-Headed Attention layer. Defaults to 8"},
    )
    num_attn_blocks: int = field(
        default=6,
        metadata={"help": "The number of layers of stacked Multi-Headed Attention layers. Defaults to 6"},
    )
    transformer_head_dim: Optional[int] = field(
        default=None,
        metadata={
            "help": "The number of hidden units in the Multi-Headed Attention layers."
            " Defaults to None and will be same as input_dim."
        },
    )
    attn_dropout: float = field(
        default=0.1,
        metadata={"help": "Dropout to be applied after Multi headed Attention. Defaults to 0.1"},
    )
    add_norm_dropout: float = field(
        default=0.1,
        metadata={"help": "Dropout to be applied in the AddNorm Layer. Defaults to 0.1"},
    )
    ff_dropout: float = field(
        default=0.1,
        metadata={"help": "Dropout to be applied in the Positionwise FeedForward Network. Defaults to 0.1"},
    )
    ff_hidden_multiplier: int = field(
        default=4,
        metadata={"help": "Multiple by which the Positionwise FF layer scales the input. Defaults to 4"},
    )

    transformer_activation: str = field(
        default="GEGLU",
        metadata={
            "help": "The activation type in the transformer feed forward layers."
            " In addition to the default activation in PyTorch like ReLU, TanH, LeakyReLU, etc."
            " https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity,"
            " GEGLU, ReGLU and SwiGLU are also implemented (https://arxiv.org/pdf/2002.05202.pdf)."
            " Defaults to GEGLU",
        },
    )

    _module_src: str = field(default="models.ft_transformer")
    _model_name: str = field(default="FTTransformerModel")
    _backbone_name: str = field(default="FTTransformerBackbone")
    _config_name: str = field(default="FTTransformerConfig")

Bases: ModelConfig

Gated Adaptive Network for Deep Automated Learning of Features (GANDALF) Config.

Parameters:

Name	Type	Description	Default
`gflu_stages`	`int`	Number of layers in the feature abstraction layer. Defaults to 6	`6`
`gflu_dropout`	`float`	Dropout rate for the feature abstraction layer. Defaults to 0.0	`0.0`
`gflu_feature_init_sparsity`	`float`	Only valid for t-softmax. The percentage of features to be selected in each GFLU stage. This is just initialized and during learning it may change. Defaults to 0.3	`0.3`
`learnable_sparsity`	`bool`	Only valid for t-softmax. If True, the sparsity parameters will be learned. If False, the sparsity parameters will be fixed to the initial values specified in `gflu_feature_init_sparsity` and `tree_feature_init_sparsity`. Defaults to True	`True`
`task`	`str`	Specify whether the problem is regression or classification. `backbone` is a task which considers the model as a backbone to generate features. Mostly used internally for SSL and related tasks. Choices are: [`regression`,`classification`,`backbone`].	required
`head`	`Optional[str]`	The head to be used for the model. Should be one of the heads defined in `pytorch_tabular.models.common.heads`. Defaults to LinearHead. Choices are: [`None`,`LinearHead`,`MixtureDensityHead`].	`'LinearHead'`
`head_config`	`Optional[Dict]`	The config as a dict which defines the head. If left empty, will be initialized as default linear head.	`lambda: {'layers': ''}()`
`embedding_dims`	`Optional[List]`	The dimensions of the embedding for each categorical column as a list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of the categorical column using the rule min(50, (x + 1) // 2)	`None`
`embedding_dropout`	`float`	Dropout to be applied to the Categorical Embedding. Defaults to 0.0	`0.0`
`batch_norm_continuous_input`	`bool`	If True, we will normalize the continuous layer by passing it through a BatchNorm layer.	`True`
`learning_rate`	`float`	The learning rate of the model. Defaults to 1e-3.	`0.001`
`loss`	`Optional[str]`	The loss function to be applied. By Default, it is MSELoss for regression and CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss or L1Loss for regression and CrossEntropyLoss for classification	`None`
`metrics`	`Optional[List[str]]`	the list of metrics you need to track during training. The metrics should be one of the functional metrics implemented in `torchmetrics`. By default, it is accuracy if classification and mean_squared_error for regression	`None`
`metrics_params`	`Optional[List]`	The parameters to be passed to the metrics function. `task` is forced to be `multiclass` because the multiclass version can handle binary as well and for simplicity we are only using `multiclass`.	`None`
`metrics_prob_input`	`Optional[List]`	Is a mandatory parameter for classification metrics defined in the config. This defines whether the input to the metric function is the probability or the class. Length should be same as the number of metrics. Defaults to None.	`None`
`target_range`	`Optional[List]`	The range in which we should limit the output variable. Currently ignored for multi-target regression. Typically used for Regression problems. If left empty, will not apply any restrictions	`None`
`seed`	`int`	The seed for reproducibility. Defaults to 42	`42`

Source code in src/pytorch_tabular/models/gandalf/config.py

@dataclass
class GANDALFConfig(ModelConfig):
    """Gated Adaptive Network for Deep Automated Learning of Features (GANDALF) Config.

    Args:
        gflu_stages (int): Number of layers in the feature abstraction layer. Defaults to 6

        gflu_dropout (float): Dropout rate for the feature abstraction layer. Defaults to 0.0

        gflu_feature_init_sparsity (float): Only valid for t-softmax. The percentage of features
                to be selected in each GFLU stage. This is just initialized and during learning
                it may change. Defaults to 0.3

        learnable_sparsity (bool): Only valid for t-softmax. If True, the sparsity parameters
                will be learned. If False, the sparsity parameters will be fixed to the initial
                values specified in `gflu_feature_init_sparsity` and `tree_feature_init_sparsity`.
                Defaults to True

        task (str): Specify whether the problem is regression or classification. `backbone` is a task which
                considers the model as a backbone to generate features. Mostly used internally for SSL and related
                tasks. Choices are: [`regression`,`classification`,`backbone`].

        head (Optional[str]): The head to be used for the model. Should be one of the heads defined in
                `pytorch_tabular.models.common.heads`. Defaults to  LinearHead. Choices are:
                [`None`,`LinearHead`,`MixtureDensityHead`].

        head_config (Optional[Dict]): The config as a dict which defines the head. If left empty, will be
                initialized as default linear head.

        embedding_dims (Optional[List]): The dimensions of the embedding for each categorical column as a
                list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of
                the categorical column using the rule min(50, (x + 1) // 2)

        embedding_dropout (float): Dropout to be applied to the Categorical Embedding. Defaults to 0.0

        batch_norm_continuous_input (bool): If True, we will normalize the continuous layer by passing it
                through a BatchNorm layer.

        learning_rate (float): The learning rate of the model. Defaults to 1e-3.

        loss (Optional[str]): The loss function to be applied. By Default, it is MSELoss for regression and
                CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss
                or L1Loss for regression and CrossEntropyLoss for classification

        metrics (Optional[List[str]]): the list of metrics you need to track during training. The metrics
                should be one of the functional metrics implemented in ``torchmetrics``. By default, it is
                accuracy if classification and mean_squared_error for regression

        metrics_params (Optional[List]): The parameters to be passed to the metrics function. `task` is forced to
                be `multiclass` because the multiclass version can handle binary as well and for simplicity we are
                only using `multiclass`.

        metrics_prob_input (Optional[List]): Is a mandatory parameter for classification metrics defined in the config.
            This defines whether the input to the metric function is the probability or the class. Length should be
            same as the number of metrics. Defaults to None.

        target_range (Optional[List]): The range in which we should limit the output variable. Currently
                ignored for multi-target regression. Typically used for Regression problems. If left empty, will
                not apply any restrictions

        seed (int): The seed for reproducibility. Defaults to 42

    """

    gflu_stages: int = field(
        default=6,
        metadata={"help": "Number of layers in the feature abstraction layer. Defaults to 6"},
    )

    gflu_dropout: float = field(
        default=0.0, metadata={"help": "Dropout rate for the feature abstraction layer. Defaults to 0.0"}
    )

    gflu_feature_init_sparsity: float = field(
        default=0.3,
        metadata={
            "help": "Only valid for t-softmax. The perecentge of features to be selected in "
            "each GFLU stage. This is just initialized and during learning it may change"
        },
    )
    learnable_sparsity: bool = field(
        default=True,
        metadata={
            "help": "Only valid for t-softmax. If True, the sparsity parameters will be learned."
            "If False, the sparsity parameters will be fixed to the initial values specified in "
            "`gflu_feature_init_sparsity` and `tree_feature_init_sparsity`"
        },
    )
    _module_src: str = field(default="models.gandalf")
    _model_name: str = field(default="GANDALFModel")
    _backbone_name: str = field(default="GANDALFBackbone")
    _config_name: str = field(default="GANDALFConfig")

    def __post_init__(self):
        assert self.gflu_stages > 0, "gflu_stages should be greater than 0"
        return super().__post_init__()

Bases: ModelConfig

Gated Additive Tree Ensemble configuration.

Parameters:

Name	Type	Description	Default
`gflu_stages`	`int`	Number of layers in the feature abstraction layer. Defaults to 6	`6`
`gflu_dropout`	`float`	Dropout rate for the feature abstraction layer. Defaults to 0.0	`0.0`
`tree_depth`	`int`	Depth of the tree. Defaults to 5	`4`
`num_trees`	`int`	Number of trees to use in the ensemble. Defaults to 20	`10`
`binning_activation`	`str`	The binning function to use. Defaults to entmoid. Defaults to sparsemoid. Choices are: [`entmoid`,`sparsemoid`,`sigmoid`].	`'sparsemoid'`
`feature_mask_function`	`str`	The feature mask function to use. Defaults to sparsemax. Choices are: [`entmax`,`sparsemax`,`softmax`].	`'t-softmax'`
`tree_dropout`	`float`	probability of dropout in tree binning transformation. Defaults to 0.0	`0.0`
`chain_trees`	`bool`	If True, we will chain the trees together. Synonymous to boosting (chaining trees) or bagging (parallel trees). Defaults to True	`True`
`tree_wise_attention`	`bool`	If True, we will use tree wise attention to combine trees. Defaults to True	`True`
`tree_wise_attention_dropout`	`float`	probability of dropout in the tree wise attention layer. Defaults to 0.0	`0.0`
`share_head_weights`	`bool`	If True, we will share the weights between the heads. Defaults to True	`True`
`task`	`str`	Specify whether the problem is regression or classification. `backbone` is a task which considers the model as a backbone to generate features. Mostly used internally for SSL and related tasks. Choices are: [`regression`,`classification`,`backbone`].	required
`head`	`Optional[str]`	The head to be used for the model. Should be one of the heads defined in `pytorch_tabular.models.common.heads`. Defaults to LinearHead. Choices are: [`None`,`LinearHead`,`MixtureDensityHead`].	`'LinearHead'`
`head_config`	`Optional[Dict]`	The config as a dict which defines the head. If left empty, will be initialized as default linear head.	`lambda: {'layers': ''}()`
`embedding_dims`	`Optional[List]`	The dimensions of the embedding for each categorical column as a list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of the categorical column using the rule min(50, (x + 1) // 2)	`None`
`embedding_dropout`	`float`	Dropout to be applied to the Categorical Embedding. Defaults to 0.0	`0.0`
`batch_norm_continuous_input`	`bool`	If True, we will normalize the continuous layer by passing it through a BatchNorm layer.	`True`
`learning_rate`	`float`	The learning rate of the model. Defaults to 1e-3.	`0.001`
`loss`	`Optional[str]`	The loss function to be applied. By Default, it is MSELoss for regression and CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss or L1Loss for regression and CrossEntropyLoss for classification	`None`
`metrics`	`Optional[List[str]]`	the list of metrics you need to track during training. The metrics should be one of the functional metrics implemented in `torchmetrics`. By default, it is accuracy if classification and mean_squared_error for regression	`None`
`metrics_params`	`Optional[List]`	The parameters to be passed to the metrics function. `task` is forced to be `multiclass` because the multiclass version can handle binary as well and for simplicity we are only using `multiclass`.	`None`
`metrics_prob_input`	`Optional[List]`	Is a mandatory parameter for classification metrics defined in the config. This defines whether the input to the metric function is the probability or the class. Length should be same as the number of metrics. Defaults to None.	`None`
`target_range`	`Optional[List]`	The range in which we should limit the output variable. Currently ignored for multi-target regression. Typically used for Regression problems. If left empty, will not apply any restrictions	`None`
`seed`	`int`	The seed for reproducibility. Defaults to 42	`42`

Source code in src/pytorch_tabular/models/gate/config.py

@dataclass
class GatedAdditiveTreeEnsembleConfig(ModelConfig):
    """Gated Additive Tree Ensemble configuration.

    Args:
        gflu_stages (int): Number of layers in the feature abstraction layer. Defaults to 6

        gflu_dropout (float): Dropout rate for the feature abstraction layer. Defaults to 0.0

        tree_depth (int): Depth of the tree. Defaults to 5

        num_trees (int): Number of trees to use in the ensemble. Defaults to 20

        binning_activation (str): The binning function to use. Defaults to entmoid. Defaults to sparsemoid.
                Choices are: [`entmoid`,`sparsemoid`,`sigmoid`].

        feature_mask_function (str): The feature mask function to use. Defaults to sparsemax. Choices are:
                [`entmax`,`sparsemax`,`softmax`].

        tree_dropout (float): probability of dropout in tree binning transformation. Defaults to 0.0

        chain_trees (bool): If True, we will chain the trees together. Synonymous to boosting
            (chaining trees) or bagging (parallel trees). Defaults to True

        tree_wise_attention (bool): If True, we will use tree wise attention to combine trees. Defaults to
                True

        tree_wise_attention_dropout (float): probability of dropout in the tree wise attention layer.
                Defaults to 0.0

        share_head_weights (bool): If True, we will share the weights between the heads. Defaults to True


        task (str): Specify whether the problem is regression or classification. `backbone` is a task which
                considers the model as a backbone to generate features. Mostly used internally for SSL and related
                tasks. Choices are: [`regression`,`classification`,`backbone`].

        head (Optional[str]): The head to be used for the model. Should be one of the heads defined in
                `pytorch_tabular.models.common.heads`. Defaults to  LinearHead. Choices are:
                [`None`,`LinearHead`,`MixtureDensityHead`].

        head_config (Optional[Dict]): The config as a dict which defines the head. If left empty, will be
                initialized as default linear head.

        embedding_dims (Optional[List]): The dimensions of the embedding for each categorical column as a
                list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of
                the categorical column using the rule min(50, (x + 1) // 2)

        embedding_dropout (float): Dropout to be applied to the Categorical Embedding. Defaults to 0.0

        batch_norm_continuous_input (bool): If True, we will normalize the continuous layer by passing it
                through a BatchNorm layer.

        learning_rate (float): The learning rate of the model. Defaults to 1e-3.

        loss (Optional[str]): The loss function to be applied. By Default, it is MSELoss for regression and
                CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss
                or L1Loss for regression and CrossEntropyLoss for classification

        metrics (Optional[List[str]]): the list of metrics you need to track during training. The metrics
                should be one of the functional metrics implemented in ``torchmetrics``. By default, it is
                accuracy if classification and mean_squared_error for regression

        metrics_params (Optional[List]): The parameters to be passed to the metrics function. `task` is forced to
                be `multiclass` because the multiclass version can handle binary as well and for simplicity we are
                only using `multiclass`.

        metrics_prob_input (Optional[List]): Is a mandatory parameter for classification metrics defined in the config.
            This defines whether the input to the metric function is the probability or the class. Length should be
            same as the number of metrics. Defaults to None.

        target_range (Optional[List]): The range in which we should limit the output variable. Currently
                ignored for multi-target regression. Typically used for Regression problems. If left empty, will
                not apply any restrictions

        seed (int): The seed for reproducibility. Defaults to 42

    """

    gflu_stages: int = field(
        default=6,
        metadata={"help": "Number of layers in the feature abstraction layer. Defaults to 6"},
    )

    gflu_dropout: float = field(
        default=0.0, metadata={"help": "Dropout rate for the feature abstraction layer. Defaults to 0.0"}
    )

    tree_depth: int = field(default=4, metadata={"help": "Depth of the tree. Defaults to 5"})

    num_trees: int = field(
        default=10,
        metadata={"help": "Number of trees to use in the ensemble. Defaults to 20"},
    )

    binning_activation: str = field(
        default="sparsemoid",
        metadata={
            "help": "The binning function to use. Defaults to entmoid. Defaults to entmoid",
            "choices": ["entmoid", "sparsemoid", "sigmoid"],
        },
    )
    feature_mask_function: str = field(
        default="t-softmax",
        metadata={
            "help": "The feature mask function to use. Defaults to entmax",
            "choices": ["entmax", "sparsemax", "softmax", "t-softmax"],
        },
    )
    gflu_feature_init_sparsity: float = field(
        default=0.3,
        metadata={
            "help": "Only valid for t-softmax. The percentage of features to be dropped in "
            "each GFLU stage. This is just initialized and during learning it may change"
        },
    )
    tree_feature_init_sparsity: float = field(
        default=0.8,
        metadata={
            "help": "Only valid for t-softmax. The perecentge of features to be dropped in "
            "each split in the tree. This is just initialized and during learning it may change"
        },
    )
    learnable_sparsity: bool = field(
        default=True,
        metadata={
            "help": "Only valid for t-softmax. If True, the sparsity parameters will be learned."
            "If False, the sparsity parameters will be fixed to the initial values specified in "
            "`gflu_feature_init_sparsity` and `tree_feature_init_sparsity`"
        },
    )

    tree_dropout: float = field(
        default=0.0,
        metadata={"help": "probability of dropout in tree binning transformation. Defaults to 0.0"},
    )
    chain_trees: bool = field(
        default=True,
        metadata={
            "help": "If True, we will chain the trees together."
            " Synonymous to boosting (chaining trees) or bagging (parallel trees). Defaults to True"
        },
    )
    tree_wise_attention: bool = field(
        default=True,
        metadata={"help": "If True, we will use tree wise attention to combine trees. Defaults to True"},
    )
    tree_wise_attention_dropout: float = field(
        default=0.0,
        metadata={"help": "probability of dropout in the tree wise attention layer. Defaults to 0.0"},
    )
    share_head_weights: bool = field(
        default=True,
        metadata={"help": "If True, we will share the weights between the heads. Defaults to True"},
    )

    _module_src: str = field(default="models.gate")
    _model_name: str = field(default="GatedAdditiveTreeEnsembleModel")
    _backbone_name: str = field(default="GatedAdditiveTreesBackbone")
    _config_name: str = field(default="GatedAdditiveTreeEnsembleConfig")

    def __post_init__(self):
        assert self.tree_depth > 0, "tree_depth should be greater than 0"
        # Either gflu_stages or num_trees should be greater than 0
        assert self.num_trees > 0, (
            "`num_trees` must be greater than 0." "If you want a lighter model which performs better, use GANDALF."
        )
        super().__post_init__()

Bases: ModelConfig

MDN configuration.

Parameters:

Name	Type	Description	Default
`backbone_config_class`	`str`	The config class for defining the Backbone. The config class should be a valid module path from `models`. e.g. `FTTransformerConfig`	`None`
`backbone_config_params`	`Dict`	The dict of config parameters for defining the Backbone.	`None`
`task`	`str`	Specify whether the problem is regression or classification. `backbone` is a task which considers the model as a backbone to generate features. Mostly used internally for SSL and related tasks. Choices are: [`regression`,`classification`,`backbone`].	required
`head`	`str`		`'LinearHead'`
`head_config`	`Dict`	The config for defining the Mixed Density Network Head	`None`
`embedding_dims`	`Optional[List]`	The dimensions of the embedding for each categorical column as a list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of the categorical column using the rule min(50, (x + 1) // 2)	`None`
`embedding_dropout`	`float`	Dropout to be applied to the Categorical Embedding. Defaults to 0.0	`0.0`
`batch_norm_continuous_input`	`bool`	If True, we will normalize the continuous layer by passing it through a BatchNorm layer.	`True`
`learning_rate`	`float`	The learning rate of the model. Defaults to 1e-3.	`0.001`
`loss`	`Optional[str]`	The loss function to be applied. By Default, it is MSELoss for regression and CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss or L1Loss for regression and CrossEntropyLoss for classification	`None`
`metrics`	`Optional[List[str]]`	the list of metrics you need to track during training. The metrics should be one of the functional metrics implemented in `torchmetrics`. By default, it is accuracy if classification and mean_squared_error for regression	`None`
`metrics_params`	`Optional[List]`	The parameters to be passed to the metrics function. `task` is forced to be `multiclass` because the multiclass version can handle binary as well and for simplicity we are only using `multiclass`.	`None`
`metrics_prob_input`	`Optional[List]`	Is a mandatory parameter for classification metrics defined in the config. This defines whether the input to the metric function is the probability or the class. Length should be same as the number of metrics. Defaults to None.	`None`
`target_range`	`Optional[List]`	The range in which we should limit the output variable. Currently ignored for multi-target regression. Typically used for Regression problems. If left empty, will not apply any restrictions	`None`
`seed`	`int`	The seed for reproducibility. Defaults to 42	`42`

Source code in src/pytorch_tabular/models/mixture_density/config.py

@dataclass
class MDNConfig(ModelConfig):
    """MDN configuration.

    Args:
        backbone_config_class (str): The config class for defining the Backbone. The config class should be
                a valid module path from `models`. e.g. `FTTransformerConfig`

        backbone_config_params (Dict): The dict of config parameters for defining the Backbone.


        task (str): Specify whether the problem is regression or classification. `backbone` is a task which
                considers the model as a backbone to generate features. Mostly used internally for SSL and related
                tasks. Choices are: [`regression`,`classification`,`backbone`].

        head (str):

        head_config (Dict): The config for defining the Mixed Density Network Head

        embedding_dims (Optional[List]): The dimensions of the embedding for each categorical column as a
                list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of
                the categorical column using the rule min(50, (x + 1) // 2)

        embedding_dropout (float): Dropout to be applied to the Categorical Embedding. Defaults to 0.0

        batch_norm_continuous_input (bool): If True, we will normalize the continuous layer by passing it
                through a BatchNorm layer.

        learning_rate (float): The learning rate of the model. Defaults to 1e-3.

        loss (Optional[str]): The loss function to be applied. By Default, it is MSELoss for regression and
                CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss
                or L1Loss for regression and CrossEntropyLoss for classification

        metrics (Optional[List[str]]): the list of metrics you need to track during training. The metrics
                should be one of the functional metrics implemented in ``torchmetrics``. By default, it is
                accuracy if classification and mean_squared_error for regression

        metrics_params (Optional[List]): The parameters to be passed to the metrics function. `task` is forced to
                be `multiclass` because the multiclass version can handle binary as well and for simplicity we are
                only using `multiclass`.

        metrics_prob_input (Optional[List]): Is a mandatory parameter for classification metrics defined in the config.
            This defines whether the input to the metric function is the probability or the class. Length should be
            same as the number of metrics. Defaults to None.

        target_range (Optional[List]): The range in which we should limit the output variable. Currently
                ignored for multi-target regression. Typically used for Regression problems. If left empty, will
                not apply any restrictions

        seed (int): The seed for reproducibility. Defaults to 42

    """

    backbone_config_class: str = field(
        default=None,
        metadata={
            "help": "The config class for defining the Backbone."
            " The config class should be a valid module path from `models`. e.g. `FTTransformerConfig`"
        },
    )
    backbone_config_params: Dict = field(
        default=None,
        metadata={"help": "The dict of config parameters for defining the Backbone."},
    )
    head: str = field(init=False, default="MixtureDensityHead")
    head_config: Dict = field(
        default=None,
        metadata={"help": "The config for defining the Mixed Density Network Head"},
    )
    _module_src: str = field(default="models.mixture_density")
    _model_name: str = field(default="MDNModel")
    _config_name: str = field(default="MDNConfig")
    _probabilistic: bool = field(default=True)

    def __post_init__(self):
        assert (
            self.backbone_config_class not in INCOMPATIBLE_BACKBONES
        ), f"{self.backbone_config_class} is not a supported backbone for MDN head"
        assert self.head == "MixtureDensityHead"
        return super().__post_init__()

Bases: ModelConfig

Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data configuration.

Parameters:

Name	Type	Description	Default
`num_layers`	`int`	Number of Oblivious Decision Tree Layers in the Dense Architecture	`1`
`num_trees`	`int`	Number of Oblivious Decision Trees in each layer	`2048`
`additional_tree_output_dim`	`int`	The additional output dimensions which is only used to pass through different layers of the architectures. Only the first output_dim outputs will be used for prediction	`3`
`depth`	`int`	The depth of the individual Oblivious Decision Trees	`6`
`choice_function`	`str`	Generates a sparse probability distribution to be used as feature weights(aka, soft feature selection). Choices are: [`entmax15`,`sparsemax`].	`'entmax15'`
`bin_function`	`str`	Generates a sparse probability distribution to be used as tree leaf weights. Choices are: [`entmoid15`,`sparsemoid`].	`'entmoid15'`
`max_features`	`Optional[int]`	If not None, sets a max limit on the number of features to be carried forward from layer to layer in the Dense Architecture	`None`
`input_dropout`	`float`	Dropout to be applied to the inputs between layers of the Dense Architecture	`0.0`
`initialize_response`	`str`	Initializing the response variable in the Oblivious Decision Trees. By default, it is a standard normal distribution. Choices are: [`normal`,`uniform`].	`'normal'`
`initialize_selection_logits`	`str`	Initializing the feature selector. By default, is a uniform distribution across the features. Choices are: [`uniform`,`normal`].	`'uniform'`
`threshold_init_beta`	`float`	Used in the Data-aware initialization of thresholds where the threshold is initialized randomly (with a beta distribution) to feature values in the first batch. It initializes threshold to a q-th quantile of data points. where q ~ Beta(:threshold_init_beta:, :threshold_init_beta:) If this param is set to 1, initial thresholds will have the same distribution as data points If greater than 1 (e.g. 10), thresholds will be closer to median data value If less than 1 (e.g. 0.1), thresholds will approach min/max data values.	`1.0`
`threshold_init_cutoff`	`float`	Used in the Data-aware initialization of scales(used in the scaling ODTs). It is initialized in such a way that all the samples in the first batch belong to the linear region of the entmoid/sparsemoid(bin-selectors) and thereby have non-zero gradients Threshold log-temperatures initializer, in (0, inf) By default(1.0), log-temperatures are initialized in such a way that all bin selectors end up in the linear region of sparse-sigmoid. The temperatures are then scaled by this parameter. Setting this value > 1.0 will result in some margin between data points and sparse-sigmoid cutoff value Setting this value < 1.0 will cause (1 - value) part of data points to end up in flat sparse- sigmoid region For instance, threshold_init_cutoff = 0.9 will set 10% points equal to 0.0 or 1.0 Setting this value > 1.0 will result in a margin between data points and sparse-sigmoid cutoff value All points will be between (0.5 - 0.5 / threshold_init_cutoff) and (0.5 + 0.5 / threshold_init_cutoff)	`1.0`
`task`	`str`	Specify whether the problem is regression or classification. `backbone` is a task which considers the model as a backbone to generate features. Mostly used internally for SSL and related tasks. Choices are: [`regression`,`classification`,`backbone`].	required
`head`	`Optional[str]`	The head to be used for the model. Should be one of the heads defined in `pytorch_tabular.models.common.heads`. Defaults to LinearHead. Choices are: [`None`,`LinearHead`,`MixtureDensityHead`].	`None`
`head_config`	`Optional[Dict]`	The config as a dict which defines the head. If left empty, will be initialized as default linear head.	`lambda: {'layers': ''}()`
`embedding_dims`	`Optional[List]`	The dimensions of the embedding for each categorical column as a list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of the categorical column using the rule min(50, (x + 1) // 2)	`None`
`embedding_dropout`	`float`	Dropout to be applied to the Categorical Embedding. Defaults to 0.0	`0.0`
`batch_norm_continuous_input`	`bool`	If True, we will normalize the continuous layer by passing it through a BatchNorm layer.	`True`
`learning_rate`	`float`	The learning rate of the model. Defaults to 1e-3.	`0.001`
`loss`	`Optional[str]`	The loss function to be applied. By Default, it is MSELoss for regression and CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss or L1Loss for regression and CrossEntropyLoss for classification	`None`
`metrics`	`Optional[List[str]]`	the list of metrics you need to track during training. The metrics should be one of the functional metrics implemented in `torchmetrics`. By default, it is accuracy if classification and mean_squared_error for regression	`None`
`metrics_params`	`Optional[List]`	The parameters to be passed to the metrics function. `task` is forced to be `multiclass` because the multiclass version can handle binary as well and for simplicity we are only using `multiclass`.	`None`
`metrics_prob_input`	`Optional[List]`	Is a mandatory parameter for classification metrics defined in the config. This defines whether the input to the metric function is the probability or the class. Length should be same as the number of metrics. Defaults to None.	`None`
`target_range`	`Optional[List]`	The range in which we should limit the output variable. Currently ignored for multi-target regression. Typically used for Regression problems. If left empty, will not apply any restrictions	`None`
`seed`	`int`	The seed for reproducibility. Defaults to 42	`42`

Source code in src/pytorch_tabular/models/node/config.py

@dataclass
class NodeConfig(ModelConfig):
    """Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data configuration.

    Args:
        num_layers (int): Number of Oblivious Decision Tree Layers in the Dense Architecture

        num_trees (int): Number of Oblivious Decision Trees in each layer

        additional_tree_output_dim (int): The additional output dimensions which is only used to pass
                through different layers of the architectures. Only the first output_dim outputs will be used for
                prediction

        depth (int): The depth of the individual Oblivious Decision Trees

        choice_function (str): Generates a sparse probability distribution to be used as feature
                weights(aka, soft feature selection). Choices are: [`entmax15`,`sparsemax`].

        bin_function (str): Generates a sparse probability distribution to be used as tree leaf weights.
                Choices are: [`entmoid15`,`sparsemoid`].

        max_features (Optional[int]): If not None, sets a max limit on the number of features to be carried
                forward from layer to layer in the Dense Architecture

        input_dropout (float): Dropout to be applied to the inputs between layers of the Dense Architecture

        initialize_response (str): Initializing the response variable in the Oblivious Decision Trees. By
                default, it is a standard normal distribution. Choices are: [`normal`,`uniform`].

        initialize_selection_logits (str): Initializing the feature selector. By default, is a uniform
                distribution across the features. Choices are: [`uniform`,`normal`].

        threshold_init_beta (float):                  Used in the Data-aware initialization of thresholds
                where the threshold is initialized randomly                 (with a beta distribution) to feature
                values in the first batch.                 It initializes threshold to a q-th quantile of data
                points.                 where q ~ Beta(:threshold_init_beta:, :threshold_init_beta:)
                If this param is set to 1, initial thresholds will have the same distribution as data points
                If greater than 1 (e.g. 10), thresholds will be closer to median data value                 If
                less than 1 (e.g. 0.1), thresholds will approach min/max data values.

        threshold_init_cutoff (float):                  Used in the Data-aware initialization of
                scales(used in the scaling ODTs).                 It is initialized in such a way that all the
                samples in the first batch belong to the linear                 region of the
                entmoid/sparsemoid(bin-selectors) and thereby have non-zero gradients                 Threshold
                log-temperatures initializer, in (0, inf)                 By default(1.0), log-temperatures are
                initialized in such a way that all bin selectors                 end up in the linear region of
                sparse-sigmoid. The temperatures are then scaled by this parameter.                 Setting this
                value > 1.0 will result in some margin between data points and sparse-sigmoid cutoff value
                Setting this value < 1.0 will cause (1 - value) part of data points to end up in flat sparse-
                sigmoid region                 For instance, threshold_init_cutoff = 0.9 will set 10% points equal
                to 0.0 or 1.0                 Setting this value > 1.0 will result in a margin between data points
                and sparse-sigmoid cutoff value                 All points will be between (0.5 - 0.5 /
                threshold_init_cutoff) and (0.5 + 0.5 / threshold_init_cutoff)

        task (str): Specify whether the problem is regression or classification. `backbone` is a task which
                considers the model as a backbone to generate features. Mostly used internally for SSL and related
                tasks. Choices are: [`regression`,`classification`,`backbone`].

        head (Optional[str]): The head to be used for the model. Should be one of the heads defined in
                `pytorch_tabular.models.common.heads`. Defaults to  LinearHead. Choices are:
                [`None`,`LinearHead`,`MixtureDensityHead`].

        head_config (Optional[Dict]): The config as a dict which defines the head. If left empty, will be
                initialized as default linear head.

        embedding_dims (Optional[List]): The dimensions of the embedding for each categorical column as a
                list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of
                the categorical column using the rule min(50, (x + 1) // 2)

        embedding_dropout (float): Dropout to be applied to the Categorical Embedding. Defaults to 0.0

        batch_norm_continuous_input (bool): If True, we will normalize the continuous layer by passing it
                through a BatchNorm layer.

        learning_rate (float): The learning rate of the model. Defaults to 1e-3.

        loss (Optional[str]): The loss function to be applied. By Default, it is MSELoss for regression and
                CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss
                or L1Loss for regression and CrossEntropyLoss for classification

        metrics (Optional[List[str]]): the list of metrics you need to track during training. The metrics
                should be one of the functional metrics implemented in ``torchmetrics``. By default, it is
                accuracy if classification and mean_squared_error for regression

        metrics_params (Optional[List]): The parameters to be passed to the metrics function. `task` is forced to
                be `multiclass` because the multiclass version can handle binary as well and for simplicity we are
                only using `multiclass`.

        metrics_prob_input (Optional[List]): Is a mandatory parameter for classification metrics defined in the config.
            This defines whether the input to the metric function is the probability or the class. Length should be
            same as the number of metrics. Defaults to None.

        target_range (Optional[List]): The range in which we should limit the output variable. Currently
                ignored for multi-target regression. Typically used for Regression problems. If left empty, will
                not apply any restrictions

        seed (int): The seed for reproducibility. Defaults to 42

    """

    num_layers: int = field(
        default=1,
        metadata={"help": "Number of Oblivious Decision Tree Layers in the Dense Architecture"},
    )
    num_trees: int = field(
        default=2048,
        metadata={"help": "Number of Oblivious Decision Trees in each layer"},
    )
    additional_tree_output_dim: int = field(
        default=3,
        metadata={
            "help": "The additional output dimensions which is only used to pass through different layers"
            " of the architectures. Only the first output_dim outputs will be used for prediction"
        },
    )
    depth: int = field(
        default=6,
        metadata={"help": "The depth of the individual Oblivious Decision Trees"},
    )
    choice_function: str = field(
        default="entmax15",
        metadata={
            "help": "Generates a sparse probability distribution to be used"
            " as feature weights(aka, soft feature selection)",
            "choices": ["entmax15", "sparsemax"],
        },
    )
    bin_function: str = field(
        default="entmoid15",
        metadata={
            "help": "Generates a sparse probability distribution to be used as tree leaf weights",
            "choices": ["entmoid15", "sparsemoid"],
        },
    )
    max_features: Optional[int] = field(
        default=None,
        metadata={
            "help": "If not None, sets a max limit on the number of features to be carried forward"
            " from layer to layer in the Dense Architecture"
        },
    )
    input_dropout: float = field(
        default=0.0,
        metadata={"help": "Dropout to be applied to the inputs between layers of the Dense Architecture"},
    )
    initialize_response: str = field(
        default="normal",
        metadata={
            "help": "Initializing the response variable in the Oblivious Decision Trees."
            " By default, it is a standard normal distribution",
            "choices": ["normal", "uniform"],
        },
    )
    initialize_selection_logits: str = field(
        default="uniform",
        metadata={
            "help": "Initializing the feature selector. By default is a uniform distribution across the features",
            "choices": ["uniform", "normal"],
        },
    )
    threshold_init_beta: float = field(
        default=1.0,
        metadata={
            "help": """
                Used in the Data-aware initialization of thresholds where the threshold is initialized randomly
                (with a beta distribution) to feature values in the first batch.
                It initializes threshold to a q-th quantile of data points.
                where q ~ Beta(:threshold_init_beta:, :threshold_init_beta:)
                If this param is set to 1, initial thresholds will have the same distribution as data points
                If greater than 1 (e.g. 10), thresholds will be closer to median data value
                If less than 1 (e.g. 0.1), thresholds will approach min/max data values.
            """
        },
    )
    threshold_init_cutoff: float = field(
        default=1.0,
        metadata={
            "help": """
                Used in the Data-aware initialization of scales(used in the scaling ODTs).
                It is initialized in such a way that all the samples in the first batch belong to the linear
                region of the entmoid/sparsemoid(bin-selectors) and thereby have non-zero gradients
                Threshold log-temperatures initializer, in (0, inf)
                By default(1.0), log-temperatures are initialized in such a way that all bin selectors
                end up in the linear region of sparse-sigmoid. The temperatures are then scaled by this parameter.
                Setting this value > 1.0 will result in some margin between data points and sparse-sigmoid cutoff value
                Setting this value < 1.0 will cause (1 - value) part of data points to end up in flat sparse-sigmoid
                region. For instance, threshold_init_cutoff = 0.9 will set 10% points equal to 0.0 or 1.0
                Setting this value > 1.0 will result in a margin between data points and sparse-sigmoid cutoff value
                All points will be between (0.5 - 0.5 / threshold_init_cutoff) and (0.5 + 0.5 / threshold_init_cutoff)
            """
        },
    )

    head: Optional[str] = field(
        default=None,
    )

    _module_src: str = field(default="models.node")
    _model_name: str = field(default="NODEModel")
    _backbone_name: str = field(default="NODEBackbone")
    _config_name: str = field(default="NodeConfig")

    def __post_init__(self):
        if self.head is not None:
            warnings.warn(
                "`head` and `head_config` is ignored as NODE has a specific"
                " head which subsets the tree outputs. Set `head=None`"
                " to turn off the warning"
            )
        else:
            # Setting Head to LinearHead for compatibility
            self.head = "LinearHead"
        return super().__post_init__()

Bases: ModelConfig

TabNet: Attentive Interpretable Tabular Learning configuration

Parameters:

Name	Type	Description	Default
`n_d`	`int`	Dimension of the prediction layer (usually between 4 and 64)	`8`
`n_a`	`int`	Dimension of the attention layer (usually between 4 and 64)	`8`
`n_steps`	`int`	Number of successive steps in the network (usually between 3 and 10)	`3`
`gamma`	`float`	Float above 1, scaling factor for attention updates (usually between 1.0 to 2.0)	`1.3`
`n_independent`	`int`	Number of independent GLU layer in each GLU block (default 2)	`2`
`n_shared`	`int`	Number of independent GLU layer in each GLU block (default 2)	`2`
`virtual_batch_size`	`int`	Batch size for Ghost Batch Normalization	`128`
`mask_type`	`str`	Either 'sparsemax' or 'entmax' : this is the masking function to use. Choices are: [`sparsemax`,`entmax`].	`'sparsemax'`
`task`	`str`	Specify whether the problem is regression or classification. `backbone` is a task which considers the model as a backbone to generate features. Mostly used internally for SSL and related tasks. Choices are: [`regression`,`classification`,`backbone`].	required
`head`	`Optional[str]`	The head to be used for the model. Should be one of the heads defined in `pytorch_tabular.models.common.heads`. Defaults to LinearHead. Choices are: [`None`,`LinearHead`,`MixtureDensityHead`].	`'LinearHead'`
`head_config`	`Optional[Dict]`	The config as a dict which defines the head. If left empty, will be initialized as default linear head.	`lambda: {'layers': ''}()`
`embedding_dims`	`Optional[List]`	The dimensions of the embedding for each categorical column as a list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of the categorical column using the rule min(50, (x + 1) // 2)	`None`
`embedding_dropout`	`float`	Dropout to be applied to the Categorical Embedding. Defaults to 0.0	`0.0`
`batch_norm_continuous_input`	`bool`	If True, we will normalize the continuous layer by passing it through a BatchNorm layer.	`True`
`learning_rate`	`float`	The learning rate of the model. Defaults to 1e-3.	`0.001`
`loss`	`Optional[str]`	The loss function to be applied. By Default, it is MSELoss for regression and CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss or L1Loss for regression and CrossEntropyLoss for classification	`None`
`metrics`	`Optional[List[str]]`	the list of metrics you need to track during training. The metrics should be one of the functional metrics implemented in `torchmetrics`. By default, it is accuracy if classification and mean_squared_error for regression	`None`
`metrics_params`	`Optional[List]`	The parameters to be passed to the metrics function. `task` is forced to be `multiclass` because the multiclass version can handle binary as well and for simplicity we are only using `multiclass`.	`None`
`metrics_prob_input`	`Optional[List]`	Is a mandatory parameter for classification metrics defined in the config. This defines whether the input to the metric function is the probability or the class. Length should be same as the number of metrics. Defaults to None.	`None`
`target_range`	`Optional[List]`	The range in which we should limit the output variable. Currently ignored for multi-target regression. Typically used for Regression problems. If left empty, will not apply any restrictions	`None`
`seed`	`int`	The seed for reproducibility. Defaults to 42	`42`

Source code in src/pytorch_tabular/models/tabnet/config.py

@dataclass
class TabNetModelConfig(ModelConfig):
    """TabNet: Attentive Interpretable Tabular Learning configuration

    Args:
        n_d (int): Dimension of the prediction  layer (usually between 4 and 64)

        n_a (int): Dimension of the attention  layer (usually between 4 and 64)

        n_steps (int): Number of successive steps in the network (usually between 3 and 10)

        gamma (float): Float above 1, scaling factor for attention updates (usually between 1.0 to 2.0)

        n_independent (int): Number of independent GLU layer in each GLU block (default 2)

        n_shared (int): Number of independent GLU layer in each GLU block (default 2)

        virtual_batch_size (int): Batch size for Ghost Batch Normalization

        mask_type (str): Either 'sparsemax' or 'entmax' : this is the masking function to use. Choices are:
                [`sparsemax`,`entmax`].

        task (str): Specify whether the problem is regression or classification. `backbone` is a task which
                considers the model as a backbone to generate features. Mostly used internally for SSL and related
                tasks. Choices are: [`regression`,`classification`,`backbone`].

        head (Optional[str]): The head to be used for the model. Should be one of the heads defined in
                `pytorch_tabular.models.common.heads`. Defaults to  LinearHead. Choices are:
                [`None`,`LinearHead`,`MixtureDensityHead`].

        head_config (Optional[Dict]): The config as a dict which defines the head. If left empty, will be
                initialized as default linear head.

        embedding_dims (Optional[List]): The dimensions of the embedding for each categorical column as a
                list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of
                the categorical column using the rule min(50, (x + 1) // 2)

        embedding_dropout (float): Dropout to be applied to the Categorical Embedding. Defaults to 0.0

        batch_norm_continuous_input (bool): If True, we will normalize the continuous layer by passing it
                through a BatchNorm layer.

        learning_rate (float): The learning rate of the model. Defaults to 1e-3.

        loss (Optional[str]): The loss function to be applied. By Default, it is MSELoss for regression and
                CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss
                or L1Loss for regression and CrossEntropyLoss for classification

        metrics (Optional[List[str]]): the list of metrics you need to track during training. The metrics
                should be one of the functional metrics implemented in ``torchmetrics``. By default, it is
                accuracy if classification and mean_squared_error for regression

        metrics_params (Optional[List]): The parameters to be passed to the metrics function. `task` is forced to
                be `multiclass` because the multiclass version can handle binary as well and for simplicity we are
                only using `multiclass`.

        metrics_prob_input (Optional[List]): Is a mandatory parameter for classification metrics defined in the config.
            This defines whether the input to the metric function is the probability or the class. Length should be
            same as the number of metrics. Defaults to None.

        target_range (Optional[List]): The range in which we should limit the output variable. Currently
                ignored for multi-target regression. Typically used for Regression problems. If left empty, will
                not apply any restrictions

        seed (int): The seed for reproducibility. Defaults to 42
    """

    n_d: int = field(
        default=8,
        metadata={"help": "Dimension of the prediction  layer (usually between 4 and 64)"},
    )
    n_a: int = field(
        default=8,
        metadata={"help": "Dimension of the attention  layer (usually between 4 and 64)"},
    )
    n_steps: int = field(
        default=3,
        metadata={"help": ("Number of successive steps in the network (usually between 3 and 10)")},
    )
    gamma: float = field(
        default=1.3,
        metadata={"help": ("Float above 1, scaling factor for attention updates (usually between" " 1.0 to 2.0)")},
    )
    n_independent: int = field(
        default=2,
        metadata={"help": "Number of independent GLU layer in each GLU block (default 2)"},
    )
    n_shared: int = field(
        default=2,
        metadata={"help": "Number of independent GLU layer in each GLU block (default 2)"},
    )
    virtual_batch_size: int = field(
        default=128,
        metadata={"help": "Batch size for Ghost Batch Normalization"},
    )
    mask_type: str = field(
        default="sparsemax",
        metadata={
            "help": ("Either 'sparsemax' or 'entmax' : this is the masking function to use"),
            "choices": ["sparsemax", "entmax"],
        },
    )
    grouped_features: Optional[List[List[str]]] = field(
        default=None,
        metadata={
            "help": (
                "List of list of feature names to be grouped together. This allows the"
                " model to share it's attention accross feature inside a same group."
                " This can be especially useful when your preprocessing generates"
                " correlated or dependant features: like if you use a TF-IDF or a PCA"
                " on a text column. Note that feature importance will be exactly the"
                " same between features on a same group. Please also note that"
                " embeddings generated for a categorical variable are always inside a"
                " same group."
            )
        },
    )
    _module_src: str = field(default="models.tabnet")
    _model_name: str = field(default="TabNetModel")
    _config_name: str = field(default="TabNetModelConfig")
    _backbone_name: str = field(default="TabNetBackbone")

Bases: ModelConfig

Tab Transformer configuration.

Parameters:

Name	Type	Description	Default
`input_embed_dim`	`int`	The embedding dimension for the input categorical features. Defaults to 32	`32`
`embedding_initialization`	`Optional[str]`	Initialization scheme for the embedding layers. Defaults to `kaiming`. Choices are: [`kaiming_uniform`,`kaiming_normal`].	`'kaiming_uniform'`
`embedding_bias`	`bool`	Flag to turn on Embedding Bias. Defaults to False	`False`
`share_embedding`	`bool`	The flag turns on shared embeddings in the input embedding process. The key idea here is to have an embedding for the feature as a whole along with embeddings of each unique values of that column. For more details refer to Appendix A of the TabTransformer paper. Defaults to False	`False`
`share_embedding_strategy`	`Optional[str]`	There are two strategies in adding shared embeddings. 1. `add` - A separate embedding for the feature is added to the embedding of the unique values of the feature. 2. `fraction` - A fraction of the input embedding is reserved for the shared embedding of the feature. Defaults to fraction. Choices are: [`add`,`fraction`].	`'fraction'`
`shared_embedding_fraction`	`float`	Fraction of the input_embed_dim to be reserved by the shared embedding. Should be less than one. Defaults to 0.25	`0.25`
`num_heads`	`int`	The number of heads in the Multi-Headed Attention layer. Defaults to 8	`8`
`num_attn_blocks`	`int`	The number of layers of stacked Multi-Headed Attention layers. Defaults to 6	`6`
`transformer_head_dim`	`Optional[int]`	The number of hidden units in the Multi-Headed Attention layers. Defaults to None and will be same as input_dim.	`None`
`attn_dropout`	`float`	Dropout to be applied after Multi headed Attention. Defaults to 0.1	`0.1`
`add_norm_dropout`	`float`	Dropout to be applied in the AddNorm Layer. Defaults to 0.1	`0.1`
`ff_dropout`	`float`	Dropout to be applied in the Positionwise FeedForward Network. Defaults to 0.1	`0.1`
`ff_hidden_multiplier`	`int`	Multiple by which the Positionwise FF layer scales the input. Defaults to 4	`4`
`transformer_activation`	`str`	The activation type in the transformer feed forward layers. In addition to the default activation in PyTorch like ReLU, TanH, LeakyReLU, etc. https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity, GEGLU, ReGLU and SwiGLU are also implemented(https://arxiv.org/pdf/2002.05202.pdf). Defaults to GEGLU	`'GEGLU'`
`task`	`str`	Specify whether the problem is regression or classification. `backbone` is a task which considers the model as a backbone to generate features. Mostly used internally for SSL and related tasks.. Choices are: [`regression`,`classification`,`backbone`].	required
`head`	`Optional[str]`	The head to be used for the model. Should be one of the heads defined in `pytorch_tabular.models.common.heads`. Defaults to LinearHead. Choices are: [`None`,`LinearHead`,`MixtureDensityHead`].	`'LinearHead'`
`head_config`	`Optional[Dict]`	The config as a dict which defines the head. If left empty, will be initialized as default linear head.	`lambda: {'layers': ''}()`
`embedding_dims`	`Optional[List]`	The dimensions of the embedding for each categorical column as a list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of the categorical column using the rule min(50, (x + 1) // 2)	`None`
`embedding_dropout`	`float`	Dropout to be applied to the Categorical Embedding. Defaults to 0.0	`0.0`
`batch_norm_continuous_input`	`bool`	If True, we will normalize the continuous layer by passing it through a BatchNorm layer.	`True`
`learning_rate`	`float`	The learning rate of the model. Defaults to 1e-3.	`0.001`
`loss`	`Optional[str]`	The loss function to be applied. By Default, it is MSELoss for regression and CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss or L1Loss for regression and CrossEntropyLoss for classification	`None`
`metrics`	`Optional[List[str]]`	the list of metrics you need to track during training. The metrics should be one of the functional metrics implemented in `torchmetrics`. By default, it is accuracy if classification and mean_squared_error for regression	`None`
`metrics_params`	`Optional[List]`	The parameters to be passed to the metrics function. `task` is forced to be `multiclass` because the multiclass version can handle binary as well and for simplicity we are only using `multiclass`.	`None`
`metrics_prob_input`	`Optional[List]`	Is a mandatory parameter for classification metrics defined in the config. This defines whether the input to the metric function is the probability or the class. Length should be same as the number of metrics. Defaults to None.	`None`
`target_range`	`Optional[List]`	The range in which we should limit the output variable. Currently ignored for multi-target regression. Typically used for Regression problems. If left empty, will not apply any restrictions	`None`
`seed`	`int`	The seed for reproducibility. Defaults to 42	`42`

Source code in src/pytorch_tabular/models/tab_transformer/config.py

@dataclass
class TabTransformerConfig(ModelConfig):
    """Tab Transformer configuration.

    Args:
        input_embed_dim (int): The embedding dimension for the input categorical features. Defaults to 32

        embedding_initialization (Optional[str]): Initialization scheme for the embedding layers. Defaults
                to `kaiming`. Choices are: [`kaiming_uniform`,`kaiming_normal`].

        embedding_bias (bool): Flag to turn on Embedding Bias. Defaults to False

        share_embedding (bool): The flag turns on shared embeddings in the input embedding process. The key
                idea here is to have an embedding for the feature as a whole along with embeddings of each unique
                values of that column. For more details refer to Appendix A of the TabTransformer paper. Defaults
                to False

        share_embedding_strategy (Optional[str]): There are two strategies in adding shared embeddings. 1.
                `add` - A separate embedding for the feature is added to the embedding of the unique values of the
                feature. 2. `fraction` - A fraction of the input embedding is reserved for the shared embedding of
                the feature. Defaults to fraction. Choices are: [`add`,`fraction`].

        shared_embedding_fraction (float): Fraction of the input_embed_dim to be reserved by the shared
                embedding. Should be less than one. Defaults to 0.25

        num_heads (int): The number of heads in the Multi-Headed Attention layer. Defaults to 8

        num_attn_blocks (int): The number of layers of stacked Multi-Headed Attention layers. Defaults to 6

        transformer_head_dim (Optional[int]): The number of hidden units in the Multi-Headed Attention
                layers. Defaults to None and will be same as input_dim.

        attn_dropout (float): Dropout to be applied after Multi headed Attention. Defaults to 0.1

        add_norm_dropout (float): Dropout to be applied in the AddNorm Layer. Defaults to 0.1

        ff_dropout (float): Dropout to be applied in the Positionwise FeedForward Network. Defaults to 0.1

        ff_hidden_multiplier (int): Multiple by which the Positionwise FF layer scales the input. Defaults
                to 4

        transformer_activation (str): The activation type in the transformer feed forward layers. In
                addition to the default activation in PyTorch like ReLU, TanH, LeakyReLU, etc.
                https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity, GEGLU,
                ReGLU and SwiGLU are also implemented(https://arxiv.org/pdf/2002.05202.pdf). Defaults to GEGLU

        task (str): Specify whether the problem is regression or classification. `backbone` is a task which
                considers the model as a backbone to generate features. Mostly used internally for SSL and related
                tasks.. Choices are: [`regression`,`classification`,`backbone`].

        head (Optional[str]): The head to be used for the model. Should be one of the heads defined in
                `pytorch_tabular.models.common.heads`. Defaults to  LinearHead. Choices are:
                [`None`,`LinearHead`,`MixtureDensityHead`].

        head_config (Optional[Dict]): The config as a dict which defines the head. If left empty, will be
                initialized as default linear head.

        embedding_dims (Optional[List]): The dimensions of the embedding for each categorical column as a
                list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of
                the categorical column using the rule min(50, (x + 1) // 2)

        embedding_dropout (float): Dropout to be applied to the Categorical Embedding. Defaults to 0.0

        batch_norm_continuous_input (bool): If True, we will normalize the continuous layer by passing it
                through a BatchNorm layer.

        learning_rate (float): The learning rate of the model. Defaults to 1e-3.

        loss (Optional[str]): The loss function to be applied. By Default, it is MSELoss for regression and
                CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss
                or L1Loss for regression and CrossEntropyLoss for classification

        metrics (Optional[List[str]]): the list of metrics you need to track during training. The metrics
                should be one of the functional metrics implemented in ``torchmetrics``. By default, it is
                accuracy if classification and mean_squared_error for regression

        metrics_params (Optional[List]): The parameters to be passed to the metrics function. `task` is forced to
                be `multiclass` because the multiclass version can handle binary as well and for simplicity we are
                only using `multiclass`.

        metrics_prob_input (Optional[List]): Is a mandatory parameter for classification metrics defined in the config.
            This defines whether the input to the metric function is the probability or the class. Length should be
            same as the number of metrics. Defaults to None.

        target_range (Optional[List]): The range in which we should limit the output variable. Currently
                ignored for multi-target regression. Typically used for Regression problems. If left empty, will
                not apply any restrictions

        seed (int): The seed for reproducibility. Defaults to 42

    """

    input_embed_dim: int = field(
        default=32,
        metadata={"help": "The embedding dimension for the input categorical features. Defaults to 32"},
    )
    embedding_initialization: Optional[str] = field(
        default="kaiming_uniform",
        metadata={
            "help": "Initialization scheme for the embedding layers. Defaults to `kaiming`",
            "choices": ["kaiming_uniform", "kaiming_normal"],
        },
    )
    embedding_bias: bool = field(
        default=False,
        metadata={"help": "Flag to turn on Embedding Bias. Defaults to False"},
    )
    share_embedding: bool = field(
        default=False,
        metadata={
            "help": "The flag turns on shared embeddings in the input embedding process."
            " The key idea here is to have an embedding for the feature as a whole along with embeddings"
            " of each unique values of that column. For more details refer"
            " to Appendix A of the TabTransformer paper. Defaults to False"
        },
    )
    share_embedding_strategy: Optional[str] = field(
        default="fraction",
        metadata={
            "help": "There are two strategies in adding shared embeddings."
            " 1. `add` - A separate embedding for the feature is added to the embedding"
            " of the unique values of the feature."
            " 2. `fraction` - A fraction of the input embedding is reserved"
            " for the shared embedding of the feature. Defaults to fraction.",
            "choices": ["add", "fraction"],
        },
    )
    shared_embedding_fraction: float = field(
        default=0.25,
        metadata={
            "help": "Fraction of the input_embed_dim to be reserved by the shared embedding."
            " Should be less than one. Defaults to 0.25"
        },
    )
    num_heads: int = field(
        default=8,
        metadata={"help": "The number of heads in the Multi-Headed Attention layer. Defaults to 8"},
    )
    num_attn_blocks: int = field(
        default=6,
        metadata={"help": "The number of layers of stacked Multi-Headed Attention layers. Defaults to 6"},
    )
    transformer_head_dim: Optional[int] = field(
        default=None,
        metadata={
            "help": "The number of hidden units in the Multi-Headed Attention layers."
            " Defaults to None and will be same as input_dim."
        },
    )
    attn_dropout: float = field(
        default=0.1,
        metadata={"help": "Dropout to be applied after Multi headed Attention. Defaults to 0.1"},
    )
    add_norm_dropout: float = field(
        default=0.1,
        metadata={"help": "Dropout to be applied in the AddNorm Layer. Defaults to 0.1"},
    )
    ff_dropout: float = field(
        default=0.1,
        metadata={"help": "Dropout to be applied in the Positionwise FeedForward Network. Defaults to 0.1"},
    )
    ff_hidden_multiplier: int = field(
        default=4,
        metadata={"help": "Multiple by which the Positionwise FF layer scales the input. Defaults to 4"},
    )
    transformer_activation: str = field(
        default="GEGLU",
        metadata={
            "help": "The activation type in the transformer feed forward layers."
            " In addition to the default activation in PyTorch like ReLU, TanH, LeakyReLU, etc."
            " https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity,"
            " GEGLU, ReGLU and SwiGLU are also implemented(https://arxiv.org/pdf/2002.05202.pdf)."
            " Defaults to GEGLU",
        },
    )
    _module_src: str = field(default="models.tab_transformer")
    _model_name: str = field(default="TabTransformerModel")
    _backbone_name: str = field(default="TabTransformerBackbone")
    _config_name: str = field(default="TabTransformerConfig")

Bases: ModelConfig

StackingModelConfig is a configuration class for the StackingModel. It is used to stack multiple models together. Now, CategoryEmbeddingModel, TabNetModel, FTTransformerModel, GatedAdditiveTreeEnsembleModel, DANetModel, AutoIntModel, GANDALFModel, NodeModel are supported.

Parameters:

Name	Type	Description	Default
`model_configs`	`list[ModelConfig]`	List of model configs to stack.	`list()`

Source code in src/pytorch_tabular/models/stacking/config.py

@dataclass
class StackingModelConfig(ModelConfig):
    """StackingModelConfig is a configuration class for the StackingModel. It is used to stack multiple models
    together. Now, CategoryEmbeddingModel, TabNetModel, FTTransformerModel, GatedAdditiveTreeEnsembleModel, DANetModel,
    AutoIntModel, GANDALFModel, NodeModel are supported.

    Args:
        model_configs (list[ModelConfig]): List of model configs to stack.

    """

    model_configs: list = field(default_factory=list, metadata={"help": "List of model configs to stack"})
    _module_src: str = field(default="models.stacking")
    _model_name: str = field(default="StackingModel")
    _backbone_name: str = field(default="StackingBackbone")
    _config_name: str = field(default="StackingConfig")

Base Model configuration.

Parameters:

Name	Type	Description	Default
`task`	`str`	Specify whether the problem is regression or classification. `backbone` is a task which considers the model as a backbone to generate features. Mostly used internally for SSL and related tasks.. Choices are: [`regression`,`classification`,`backbone`].	required
`head`	`Optional[str]`	The head to be used for the model. Should be one of the heads defined in `pytorch_tabular.models.common.heads`. Defaults to LinearHead. Choices are: [`None`,`LinearHead`,`MixtureDensityHead`].	`'LinearHead'`
`head_config`	`Optional[Dict]`	The config as a dict which defines the head. If left empty, will be initialized as default linear head.	`lambda: {'layers': ''}()`
`embedding_dims`	`Optional[List]`	The dimensions of the embedding for each categorical column as a list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of the categorical column using the rule min(50, (x + 1) // 2)	`None`
`embedding_dropout`	`float`	Dropout to be applied to the Categorical Embedding. Defaults to 0.0	`0.0`
`batch_norm_continuous_input`	`bool`	If True, we will normalize the continuous layer by passing it through a BatchNorm layer.	`True`
`virtual_batch_size`	`Optional[int]`	If not None, all BatchNorms will be converted to GhostBatchNorm's with the specified virtual batch size. Defaults to None	`None`
`learning_rate`	`float`	The learning rate of the model. Defaults to 1e-3.	`0.001`
`loss`	`Optional[str]`	The loss function to be applied. By Default, it is MSELoss for regression and CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss or L1Loss for regression and CrossEntropyLoss for classification	`None`
`metrics`	`Optional[List[str]]`	the list of metrics you need to track during training. The metrics should be one of the functional metrics implemented in `torchmetrics`. By default, it is accuracy if classification and mean_squared_error for regression	`None`
`metrics_prob_input`	`Optional[bool]`	Is a mandatory parameter for classification metrics defined in the config. This defines whether the input to the metric function is the probability or the class. Length should be same as the number of metrics. Defaults to None.	`None`
`metrics_params`	`Optional[List]`	The parameters to be passed to the metrics function. `task` is forced to be `multiclass` because the multiclass version can handle binary as well and for simplicity we are only using `multiclass`.	`None`
`target_range`	`Optional[List]`	The range in which we should limit the output variable. Currently ignored for multi-target regression. Typically used for Regression problems. If left empty, will not apply any restrictions	`None`
`seed`	`int`	The seed for reproducibility. Defaults to 42	`42`

Source code in src/pytorch_tabular/config/config.py

@dataclass
class ModelConfig:
    """Base Model configuration.

    Args:
        task (str): Specify whether the problem is regression or classification. `backbone` is a task which
                considers the model as a backbone to generate features. Mostly used internally for SSL and related
                tasks.. Choices are: [`regression`,`classification`,`backbone`].

        head (Optional[str]): The head to be used for the model. Should be one of the heads defined in
                `pytorch_tabular.models.common.heads`. Defaults to  LinearHead. Choices are:
                [`None`,`LinearHead`,`MixtureDensityHead`].

        head_config (Optional[Dict]): The config as a dict which defines the head. If left empty, will be
                initialized as default linear head.

        embedding_dims (Optional[List]): The dimensions of the embedding for each categorical column as a
                list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of
                the categorical column using the rule min(50, (x + 1) // 2)

        embedding_dropout (float): Dropout to be applied to the Categorical Embedding. Defaults to 0.0

        batch_norm_continuous_input (bool): If True, we will normalize the continuous layer by passing it
                through a BatchNorm layer.

        virtual_batch_size (Optional[int]): If not None, all BatchNorms will be converted to GhostBatchNorm's
                with the specified virtual batch size. Defaults to None

        learning_rate (float): The learning rate of the model. Defaults to 1e-3.

        loss (Optional[str]): The loss function to be applied. By Default, it is MSELoss for regression and
                CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss
                or L1Loss for regression and CrossEntropyLoss for classification

        metrics (Optional[List[str]]): the list of metrics you need to track during training. The metrics
                should be one of the functional metrics implemented in ``torchmetrics``. By default, it is
                accuracy if classification and mean_squared_error for regression

        metrics_prob_input (Optional[bool]): Is a mandatory parameter for classification metrics defined in
                the config. This defines whether the input to the metric function is the probability or the class.
                Length should be same as the number of metrics. Defaults to None.

        metrics_params (Optional[List]): The parameters to be passed to the metrics function. `task` is forced to
                be `multiclass` because the multiclass version can handle binary as well and for simplicity we are
                only using `multiclass`.

        target_range (Optional[List]): The range in which we should limit the output variable. Currently
                ignored for multi-target regression. Typically used for Regression problems. If left empty, will
                not apply any restrictions

        seed (int): The seed for reproducibility. Defaults to 42

    """

    task: str = field(
        metadata={
            "help": "Specify whether the problem is regression or classification."
            " `backbone` is a task which considers the model as a backbone to generate features."
            " Mostly used internally for SSL and related tasks.",
            "choices": ["regression", "classification", "backbone"],
        }
    )

    head: Optional[str] = field(
        default="LinearHead",
        metadata={
            "help": "The head to be used for the model. Should be one of the heads defined"
            " in `pytorch_tabular.models.common.heads`. Defaults to  LinearHead",
            "choices": [None, "LinearHead", "MixtureDensityHead"],
        },
    )

    head_config: Optional[Dict] = field(
        default_factory=lambda: {"layers": ""},
        metadata={
            "help": "The config as a dict which defines the head."
            " If left empty, will be initialized as default linear head."
        },
    )
    embedding_dims: Optional[List] = field(
        default=None,
        metadata={
            "help": "The dimensions of the embedding for each categorical column as a list of tuples "
            "(cardinality, embedding_dim). If left empty, will infer using the cardinality of the "
            "categorical column using the rule min(50, (x + 1) // 2)"
        },
    )
    embedding_dropout: float = field(
        default=0.0,
        metadata={"help": "Dropout to be applied to the Categorical Embedding. Defaults to 0.0"},
    )
    batch_norm_continuous_input: bool = field(
        default=True,
        metadata={"help": "If True, we will normalize the continuous layer by passing it through a BatchNorm layer."},
    )

    learning_rate: float = field(
        default=1e-3,
        metadata={"help": "The learning rate of the model. Defaults to 1e-3."},
    )
    loss: Optional[str] = field(
        default=None,
        metadata={
            "help": "The loss function to be applied. By Default it is MSELoss for regression "
            "and CrossEntropyLoss for classification. Unless you are sure what you are doing, "
            "leave it at MSELoss or L1Loss for regression and CrossEntropyLoss for classification"
        },
    )
    metrics: Optional[List[str]] = field(
        default=None,
        metadata={
            "help": "the list of metrics you need to track during training. The metrics should be one "
            "of the functional metrics implemented in ``torchmetrics``. To use your own metric, please "
            "use the `metric` param in the `fit` method By default, it is accuracy if classification "
            "and mean_squared_error for regression"
        },
    )
    metrics_prob_input: Optional[List[bool]] = field(
        default=None,
        metadata={
            "help": "Is a mandatory parameter for classification metrics defined in the config. This defines "
            "whether the input to the metric function is the probability or the class. Length should be same "
            "as the number of metrics. Defaults to None."
        },
    )
    metrics_params: Optional[List] = field(
        default=None,
        metadata={
            "help": "The parameters to be passed to the metrics function. `task` is forced to be `multiclass`` "
            "because the multiclass version can handle binary as well and for simplicity we are only using "
            "`multiclass`."
        },
    )
    target_range: Optional[List] = field(
        default=None,
        metadata={
            "help": "The range in which we should limit the output variable. "
            "Currently ignored for multi-target regression. Typically used for Regression problems. "
            "If left empty, will not apply any restrictions"
        },
    )

    virtual_batch_size: Optional[int] = field(
        default=None,
        metadata={
            "help": "If not None, all BatchNorms will be converted to GhostBatchNorm's "
            " with this virtual batch size. Defaults to None"
        },
    )

    seed: int = field(
        default=42,
        metadata={"help": "The seed for reproducibility. Defaults to 42"},
    )

    _module_src: str = field(default="models")
    _model_name: str = field(default="Model")
    _backbone_name: str = field(default="Backbone")
    _config_name: str = field(default="Config")

    def __post_init__(self):
        if self.task == "regression":
            self.loss = self.loss or "MSELoss"
            self.metrics = self.metrics or ["mean_squared_error"]
            self.metrics_params = [{} for _ in self.metrics] if self.metrics_params is None else self.metrics_params
            self.metrics_prob_input = [False for _ in self.metrics]  # not used in Regression. just for compatibility
        elif self.task == "classification":
            self.loss = self.loss or "CrossEntropyLoss"
            self.metrics = self.metrics or ["accuracy"]
            self.metrics_params = [{} for _ in self.metrics] if self.metrics_params is None else self.metrics_params
            self.metrics_prob_input = (
                [False for _ in self.metrics] if self.metrics_prob_input is None else self.metrics_prob_input
            )
        elif self.task == "backbone":
            self.loss = None
            self.metrics = None
            self.metrics_params = None
            if self.head is not None:
                logger.warning("`head` is not a valid parameter for backbone task. Making `head=None`")
                self.head = None
                self.head_config = None
        else:
            raise NotImplementedError(
                f"{self.task} is not a valid task. Should be one of "
                f"{self.__dataclass_fields__['task'].metadata['choices']}"
            )
        if self.metrics is not None:
            assert len(self.metrics) == len(self.metrics_params), "metrics and metric_params should have same length"

        if self.task != "backbone":
            assert self.head in dir(heads.blocks), f"{self.head} is not a valid head"
            if hasattr(self, "_config_name") and self._config_name != "MDNConfig":
                assert self.head != "MixtureDensityHead", "MixtureDensityHead is not supported as a head for regular "
                "models. Use `MDNConfig` instead. Please see Probabilistic Regression with MDN How-to-Guide in "
                "documentation for the right usage."
            _head_callable = getattr(heads.blocks, self.head)
            ideal_head_config = _head_callable._config_template
            invalid_keys = set(self.head_config.keys()) - set(ideal_head_config.__dict__.keys())
            assert len(invalid_keys) == 0, f"`head_config` has some invalid keys: {invalid_keys}"

        # For Custom models, setting these values for compatibility
        if not hasattr(self, "_config_name"):
            self._config_name = type(self).__name__
        if not hasattr(self, "_model_name"):
            self._model_name = re.sub("[Cc]onfig", "Model", self._config_name)
        if not hasattr(self, "_backbone_name"):
            self._backbone_name = re.sub("[Cc]onfig", "Backbone", self._config_name)
        _validate_choices(self)

Model Classes¶

Bases: BaseModel

Source code in src/pytorch_tabular/models/autoint/autoint.py

class AutoIntModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        super().__init__(config, **kwargs)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        # Backbone
        self._backbone = AutoIntBackbone(self.hparams)
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # Head
        self._head = self._get_head_from_config()

Bases: BaseModel

Source code in src/pytorch_tabular/models/category_embedding/category_embedding_model.py

class CategoryEmbeddingModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        super().__init__(config, **kwargs)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        # Backbone
        self._backbone = CategoryEmbeddingBackbone(self.hparams)
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # Head
        self.head = self._get_head_from_config()

Bases: BaseModel

Source code in src/pytorch_tabular/models/danet/danet.py

class DANetModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        super().__init__(config, **kwargs)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        if self.hparams.virtual_batch_size > self.hparams.batch_size:
            warnings.warn(
                f"virtual_batch_size({self.hparams.virtual_batch_size}) is greater "
                f"than batch_size ({self.hparams.batch_size}). Setting virtual_batch_size "
                f"to {self.hparams.batch_size}. DANet uses Ghost Batch Normalization, "
                f"which works best when virtual_batch_size is small. Consider setting "
                "virtual_batch_size to something like 256 or 512."
            )
            self.hparams.virtual_batch_size = self.hparams.batch_size
        # Backbone
        self._backbone = DANetBackbone(
            cat_embedding_dims=self.hparams.embedding_dims,
            n_continuous_features=self.hparams.continuous_dim,
            n_layers=self.hparams.n_layers,
            abstlay_dim_1=self.hparams.abstlay_dim_1,
            abstlay_dim_2=self.hparams.abstlay_dim_2,
            k=self.hparams.k,
            dropout_rate=self.hparams.dropout_rate,
            block_activation=getattr(nn, self.hparams.block_activation)(),
            virtual_batch_size=self.hparams.virtual_batch_size,
            embedding_dropout=self.hparams.embedding_dropout,
            batch_norm_continuous_input=self.hparams.batch_norm_continuous_input,
        )
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # Head
        self._head = self._get_head_from_config()

Bases: BaseModel

Source code in src/pytorch_tabular/models/ft_transformer/ft_transformer.py

class FTTransformerModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        super().__init__(config, **kwargs)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        # Backbone
        self._backbone = FTTransformerBackbone(self.hparams)
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # Head
        self._head = self._get_head_from_config()

    def feature_importance(self):
        if self.hparams.attn_feature_importance:
            return super().feature_importance()
        else:
            raise ValueError("If you want Feature Importance, `attn_feature_weights` should be `True`.")

Bases: BaseModel

Source code in src/pytorch_tabular/models/gandalf/gandalf.py

class GANDALFModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        super().__init__(config, **kwargs)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        # Backbone
        self._backbone = GANDALFBackbone(
            cat_embedding_dims=self.hparams.embedding_dims,
            n_continuous_features=self.hparams.continuous_dim,
            gflu_stages=self.hparams.gflu_stages,
            gflu_dropout=self.hparams.gflu_dropout,
            gflu_feature_init_sparsity=self.hparams.gflu_feature_init_sparsity,
            learnable_sparsity=self.hparams.learnable_sparsity,
            batch_norm_continuous_input=self.hparams.batch_norm_continuous_input,
            embedding_dropout=self.hparams.embedding_dropout,
            virtual_batch_size=self.hparams.virtual_batch_size,
        )
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # Head
        self.T0 = nn.Parameter(torch.rand(self.hparams.output_dim), requires_grad=True)
        self._head = nn.Sequential(self._get_head_from_config(), Add(self.T0))

    def data_aware_initialization(self, datamodule):
        if self.hparams.task == "regression":
            logger.info("Data Aware Initialization of T0")
            # Need a big batch to initialize properly
            alt_loader = datamodule.train_dataloader(batch_size=self.hparams.data_aware_init_batch_size)
            batch = next(iter(alt_loader))
            self.T0.data = torch.mean(batch["target"], dim=0)

Bases: BaseModel

Source code in src/pytorch_tabular/models/gate/gate_model.py

class GatedAdditiveTreeEnsembleModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        super().__init__(config, **kwargs)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        # Backbone
        self._backbone = GatedAdditiveTreesBackbone(
            n_continuous_features=self.hparams.continuous_dim,
            cat_embedding_dims=self.hparams.embedding_dims,
            gflu_stages=self.hparams.gflu_stages,
            gflu_dropout=self.hparams.gflu_dropout,
            num_trees=self.hparams.num_trees,
            tree_depth=self.hparams.tree_depth,
            tree_dropout=self.hparams.tree_dropout,
            binning_activation=self.hparams.binning_activation,
            feature_mask_function=self.hparams.feature_mask_function,
            batch_norm_continuous_input=self.hparams.batch_norm_continuous_input,
            chain_trees=self.hparams.chain_trees,
            tree_wise_attention=self.hparams.tree_wise_attention,
            tree_wise_attention_dropout=self.hparams.tree_wise_attention_dropout,
            gflu_feature_init_sparsity=self.hparams.gflu_feature_init_sparsity,
            tree_feature_init_sparsity=self.hparams.tree_feature_init_sparsity,
            virtual_batch_size=self.hparams.virtual_batch_size,
        )
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # Head
        if self.hparams.num_trees == 0:
            self.T0 = nn.Parameter(torch.rand(self.hparams.output_dim), requires_grad=True)
            self._head = nn.Sequential(self._get_head_from_config(), Add(self.T0))
        else:
            self._head = CustomHead(self.backbone.output_dim, self.hparams)

    def data_aware_initialization(self, datamodule):
        if self.hparams.task == "regression":
            logger.info("Data Aware Initialization of T0")
            # Need a big batch to initialize properly
            alt_loader = datamodule.train_dataloader(batch_size=self.hparams.data_aware_init_batch_size)
            batch = next(iter(alt_loader))
            t0 = torch.mean(batch["target"], dim=0)
            if self.hparams.num_trees != 0:
                self.head.T0.data = t0
            else:
                self.T0.data = t0

Bases: BaseModel

Source code in src/pytorch_tabular/models/mixture_density/mdn.py

class MDNModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        assert "inferred_config" in kwargs, "inferred_config not found in initialization arguments"
        self.inferred_config = kwargs["inferred_config"]
        assert config.task == "regression", "MDN is only implemented for Regression"
        super().__init__(config, **kwargs)
        assert self.hparams.output_dim == 1, "MDN is not implemented for multi-targets"
        if config.target_range is not None:
            logger.warning("MDN does not use target range. Ignoring it.")
        self._val_output = []

    def _get_head_from_config(self):
        _head_callable = getattr(blocks, self.hparams.head)
        self.hparams.head_config.input_dim = self.backbone.output_dim
        return _head_callable(
            config=_head_callable._config_template(**self.hparams.head_config),
        )  # output_dim auto-calculated from other configs

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        # Backbone
        callable, config = (
            self.hparams.backbone_config_class,
            self.hparams.backbone_config_params,
        )
        try:
            callable = getattr(models, callable)
        except ModuleNotFoundError as e:
            logger.error(
                "`config class` in `backbone_config` is not valid."
                " The config class should be a valid module path from `models`."
                " e.g. `ft_transformer.FTTransformerConfig`."
            )
            raise e
        assert issubclass(callable, ModelConfig), "`config_class` should be a subclass of `ModelConfig`"
        backbone_config = callable(**config)
        backbone_callable = getattr_nested(backbone_config._module_src, backbone_config._backbone_name)
        # Merging the config and inferred config
        backbone_config = safe_merge_config(OmegaConf.structured(backbone_config), self.inferred_config)
        self._backbone = backbone_callable(backbone_config)
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # Head
        self._head = self._get_head_from_config()

    # Redefining forward because TabTransformer flow is slightly different
    def forward(self, x: Dict):
        if isinstance(self.backbone, TabTransformerBackbone):
            if self.hparams.categorical_dim > 0:
                x_cat = self.embed_input({"categorical": x["categorical"]})
            x = self.compute_backbone({"categorical": x_cat, "continuous": x["continuous"]})
        else:
            x = self.embedding_layer(x)
            x = self.compute_backbone(x)
        return self.compute_head(x)

        # Redefining compute_backbone because TabTransformer flow flow is slightly different

    def compute_backbone(self, x: Union[Dict, torch.Tensor]):
        # Returns output
        if isinstance(self.backbone, TabTransformerBackbone):
            x = self.backbone(x["categorical"], x["continuous"])
        else:
            x = self.backbone(x)
        return x

    def compute_head(self, x: Tensor):
        pi, sigma, mu = self.head(x)
        return {"pi": pi, "sigma": sigma, "mu": mu, "backbone_features": x}

    def predict(self, x: Dict):
        ret_value = self.forward(x)
        return self.head.generate_point_predictions(ret_value["pi"], ret_value["sigma"], ret_value["mu"])

    def sample(self, x: Dict, n_samples: Optional[int] = None, ret_model_output=False):
        ret_value = self.forward(x)
        samples = self.head.generate_samples(ret_value["pi"], ret_value["sigma"], ret_value["mu"], n_samples)
        if ret_model_output:
            return samples, ret_value
        else:
            return samples

    def calculate_loss(self, y, pi, sigma, mu, tag="train"):
        # NLL Loss
        log_prob = self.head.log_prob(pi, sigma, mu, y)
        loss = torch.mean(-log_prob)
        if self.head.hparams.weight_regularization is not None:
            sigma_l1_reg = 0
            pi_l1_reg = 0
            mu_l1_reg = 0
            if self.head.hparams.lambda_sigma > 0:
                # Weight Regularization Sigma
                sigma_params = torch.cat([x.view(-1) for x in self.head.sigma.parameters()])
                sigma_l1_reg = self.head.hparams.lambda_sigma * torch.norm(
                    sigma_params, self.head.hparams.weight_regularization
                )
            if self.head.hparams.lambda_pi > 0:
                pi_params = torch.cat([x.view(-1) for x in self.head.pi.parameters()])
                pi_l1_reg = self.head.hparams.lambda_pi * torch.norm(pi_params, self.head.hparams.weight_regularization)
            if self.head.hparams.lambda_mu > 0:
                mu_params = torch.cat([x.view(-1) for x in self.head.mu.parameters()])
                mu_l1_reg = self.head.hparams.lambda_mu * torch.norm(mu_params, self.head.hparams.weight_regularization)

            loss = loss + sigma_l1_reg + pi_l1_reg + mu_l1_reg
        self.log(
            f"{tag}_loss",
            loss,
            on_epoch=(tag == "valid") or (tag == "test"),
            on_step=(tag == "train"),
            # on_step=False,
            logger=True,
            prog_bar=True,
        )
        return loss

    def training_step(self, batch, batch_idx):
        y = batch["target"]
        ret_value = self(batch)
        loss = self.calculate_loss(y, ret_value["pi"], ret_value["sigma"], ret_value["mu"], tag="train")
        if self.head.hparams.speedup_training:
            pass
        else:
            y_hat = self.head.generate_point_predictions(ret_value["pi"], ret_value["sigma"], ret_value["mu"])
            self.calculate_metrics(y, y_hat, tag="train")
        return loss

    def validation_step(self, batch, batch_idx):
        y = batch["target"]
        ret_value = self(batch)
        self.calculate_loss(y, ret_value["pi"], ret_value["sigma"], ret_value["mu"], tag="valid")
        y_hat = self.head.generate_point_predictions(ret_value["pi"], ret_value["sigma"], ret_value["mu"])
        self.calculate_metrics(y, y_hat, tag="valid")
        return y_hat, y, ret_value

    def test_step(self, batch, batch_idx):
        y = batch["target"]
        ret_value = self(batch)
        self.calculate_loss(y, ret_value["pi"], ret_value["sigma"], ret_value["mu"], tag="test")
        y_hat = self.head.generate_point_predictions(ret_value["pi"], ret_value["sigma"], ret_value["mu"])
        self.calculate_metrics(y, y_hat, tag="test")
        return y_hat, y

    def on_validation_batch_end(self, outputs, batch, batch_idx: int) -> None:
        self._val_output.append(outputs)
        super().on_validation_batch_end(outputs, batch, batch_idx)

    def on_validation_epoch_end(self) -> None:
        pi = [
            nn.functional.gumbel_softmax(output[2]["pi"], tau=self.head.hparams.softmax_temperature, dim=-1)
            for output in self._val_output
        ]
        pi = torch.cat(pi).detach().cpu()
        for i in range(self.head.hparams.num_gaussian):
            self.log(
                f"mean_pi_{i}",
                pi[:, i].mean(),
                on_epoch=True,
                on_step=False,
                logger=True,
                prog_bar=False,
            )

        mu = [output[2]["mu"] for output in self._val_output]
        mu = torch.cat(mu).detach().cpu()
        for i in range(self.head.hparams.num_gaussian):
            self.log(
                f"mean_mu_{i}",
                mu[:, i].mean(),
                on_epoch=True,
                on_step=False,
                logger=True,
                prog_bar=False,
            )

        sigma = [output[2]["sigma"] for output in self._val_output]
        sigma = torch.cat(sigma).detach().cpu()
        for i in range(self.head.hparams.num_gaussian):
            self.log(
                f"mean_sigma_{i}",
                sigma[:, i].mean(),
                on_epoch=True,
                on_step=False,
                logger=True,
                prog_bar=False,
            )
        if self.do_log_logits:
            logits = [output[0] for output in self._val_output]
            logits = torch.cat(logits).detach().cpu()
            fig = self.create_plotly_histogram(logits.unsqueeze(1), "logits")
            wandb.log(
                {
                    "valid_logits": fig,
                    "global_step": self.global_step,
                },
                commit=False,
            )
            if self.head.hparams.log_debug_plot:
                fig = self.create_plotly_histogram(pi, "pi", bin_dict={"start": 0.0, "end": 1.0, "size": 0.1})
                wandb.log(
                    {
                        "valid_pi": fig,
                        "global_step": self.global_step,
                    },
                    commit=False,
                )

                fig = self.create_plotly_histogram(mu, "mu")
                wandb.log(
                    {
                        "valid_mu": fig,
                        "global_step": self.global_step,
                    },
                    commit=False,
                )

                fig = self.create_plotly_histogram(sigma, "sigma")
                wandb.log(
                    {
                        "valid_sigma": fig,
                        "global_step": self.global_step,
                    },
                    commit=False,
                )
        self._val_output = []

Bases: BaseModel

Source code in src/pytorch_tabular/models/node/node_model.py

class NODEModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        super().__init__(config, **kwargs)

    def subset(self, x):
        return x[..., : self.hparams.output_dim].mean(dim=-2)

    def data_aware_initialization(self, datamodule):
        """Performs data-aware initialization for NODE."""
        logger.info(
            "Data Aware Initialization of NODE using a forward pass with "
            f"{self.hparams.data_aware_init_batch_size} batch size...."
        )
        # Need a big batch to initialize properly
        alt_loader = datamodule.train_dataloader(batch_size=self.hparams.data_aware_init_batch_size)
        batch = next(iter(alt_loader))
        for k, v in batch.items():
            if isinstance(v, list) and (len(v) == 0):
                # Skipping empty list
                continue
            # batch[k] = v.to("cpu" if self.config.gpu == 0 else "cuda")
            batch[k] = v.to(self.device)

        # single forward pass to initialize the ODST
        with torch.no_grad():
            self(batch)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        self._backbone = NODEBackbone(self.hparams)
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # average first n channels of every tree, where n is the number of output targets for regression
        # and number of classes for classification
        # Not using config head because NODE has a specific head
        warnings.warn("Ignoring head config because NODE has a specific head which subsets the tree outputs")
        self._head = Lambda(self.subset)

`data_aware_initialization(datamodule)` ¶

Performs data-aware initialization for NODE.

Source code in src/pytorch_tabular/models/node/node_model.py

def data_aware_initialization(self, datamodule):
    """Performs data-aware initialization for NODE."""
    logger.info(
        "Data Aware Initialization of NODE using a forward pass with "
        f"{self.hparams.data_aware_init_batch_size} batch size...."
    )
    # Need a big batch to initialize properly
    alt_loader = datamodule.train_dataloader(batch_size=self.hparams.data_aware_init_batch_size)
    batch = next(iter(alt_loader))
    for k, v in batch.items():
        if isinstance(v, list) and (len(v) == 0):
            # Skipping empty list
            continue
        # batch[k] = v.to("cpu" if self.config.gpu == 0 else "cuda")
        batch[k] = v.to(self.device)

    # single forward pass to initialize the ODST
    with torch.no_grad():
        self(batch)

Bases: BaseModel

Source code in src/pytorch_tabular/models/tabnet/tabnet_model.py

class TabNetModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        assert config.task in [
            "regression",
            "classification",
        ], "TabNet is only implemented for Regression and Classification"
        super().__init__(config, **kwargs)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        # TabNet has its own embedding layer.
        # So we are not using the embedding layer from BaseModel
        self._embedding_layer = nn.Identity()
        self._backbone = TabNetBackbone(self.hparams)
        setattr(self.backbone, "output_dim", self.hparams.output_dim)
        # TabNet has its own head
        self._head = nn.Identity()

    def extract_embedding(self):
        raise ValueError("Extracting Embeddings is not supported by Tabnet. Please use another" " compatible model")

Bases: BaseModel

Source code in src/pytorch_tabular/models/tab_transformer/tab_transformer.py

class TabTransformerModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        super().__init__(config, **kwargs)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        # Backbone
        self._backbone = TabTransformerBackbone(self.hparams)
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # Head
        self._head = self._get_head_from_config()

    # Redefining forward because this model flow is slightly different
    def forward(self, x: Dict):
        if self.hparams.categorical_dim > 0:
            x_cat = self.embed_input({"categorical": x["categorical"]})
        else:
            x_cat = None
        x = self.compute_backbone({"categorical": x_cat, "continuous": x["continuous"]})
        return self.compute_head(x)

    # Redefining compute_backbone because this model flow is slightly different
    def compute_backbone(self, x: Dict):
        # Returns output
        x = self.backbone(x["categorical"], x["continuous"])
        return x

Bases: BaseModel

Source code in src/pytorch_tabular/models/stacking/stacking_model.py

class StackingModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        super().__init__(config, **kwargs)

    def _build_network(self):
        self._backbone = StackingBackbone(self.hparams)
        self._embedding_layer = self._backbone._build_embedding_layer()
        self.output_dim = self._backbone.output_dim
        self._head = self._get_head_from_config()

    def _get_head_from_config(self):
        _head_callable = getattr(blocks, self.hparams.head)
        return _head_callable(
            in_units=self.output_dim,
            output_dim=self.hparams.output_dim,
            config=_head_callable._config_template(**self.hparams.head_config),
        )

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

Base Model Class¶

Bases: LightningModule

Source code in src/pytorch_tabular/models/base_model.py

class BaseModel(pl.LightningModule, metaclass=ABCMeta):
    def __init__(
        self,
        config: DictConfig,
        custom_loss: Optional[torch.nn.Module] = None,
        custom_metrics: Optional[List[Callable]] = None,
        custom_metrics_prob_inputs: Optional[List[bool]] = None,
        custom_optimizer: Optional[torch.optim.Optimizer] = None,
        custom_optimizer_params: Dict = {},
        **kwargs,
    ):
        """Base Model for PyTorch Tabular.

        Args:
            config (DictConfig): The configuration for the model.
            custom_loss (Optional[torch.nn.Module], optional): A custom loss function. Defaults to None.
            custom_metrics (Optional[List[Callable]], optional): A list of custom metrics. Defaults to None.
            custom_metrics_prob_inputs (Optional[List[bool]], optional): A list of boolean values indicating whether the
                metric requires probability inputs. Defaults to None.
            custom_optimizer (Optional[torch.optim.Optimizer], optional):
                A custom optimizer as callable or string to be imported. Defaults to None.
            custom_optimizer_params (Dict, optional): A dictionary of custom optimizer parameters. Defaults to {}.
            kwargs (Dict, optional): Additional keyword arguments.

        """
        super().__init__()
        assert "inferred_config" in kwargs, "inferred_config not found in initialization arguments"
        inferred_config = kwargs["inferred_config"]
        # Merging the config and inferred config
        config = safe_merge_config(config, inferred_config)
        self.custom_loss = custom_loss
        self.custom_metrics = custom_metrics
        self.custom_metrics_prob_inputs = custom_metrics_prob_inputs
        self.custom_optimizer = custom_optimizer
        self.custom_optimizer_params = custom_optimizer_params
        self.kwargs = kwargs
        # Updating config with custom parameters for experiment tracking
        if self.custom_loss is not None:
            config.loss = str(self.custom_loss)
        if self.custom_metrics is not None:
            # Adding metrics to config for hparams logging and tracking
            config.metrics = []
            config.metrics_params = []
            for metric in self.custom_metrics:
                if isinstance(metric, partial):
                    # extracting func names from partial functions
                    config.metrics.append(metric.func.__name__)
                    config.metrics_params.append(metric.keywords)
                else:
                    config.metrics.append(metric.__name__)
                    config.metrics_params.append(vars(metric))
            if config.task == "classification":
                config.metrics_prob_input = self.custom_metrics_prob_inputs
                for i, mp in enumerate(config.metrics_params):
                    mp.sub_params_list = []
                    for j, num_classes in enumerate(inferred_config.output_cardinality):
                        config.metrics_params[i].sub_params_list.append(
                            OmegaConf.create(
                                {
                                    "task": mp.get("task", "multiclass"),
                                    "num_classes": mp.get("num_classes", num_classes),
                                }
                            )
                        )

        # Updating default metrics in config
        elif config.task == "classification":
            # Adding metric_params to config for classification task
            for i, mp in enumerate(config.metrics_params):
                mp.sub_params_list = []
                for j, num_classes in enumerate(inferred_config.output_cardinality):
                    # config.metrics_params[i][j]["task"] = mp.get("task", "multiclass")
                    # config.metrics_params[i][j]["num_classes"] = mp.get("num_classes", num_classes)

                    config.metrics_params[i].sub_params_list.append(
                        OmegaConf.create(
                            {"task": mp.get("task", "multiclass"), "num_classes": mp.get("num_classes", num_classes)}
                        )
                    )

                    if config.metrics[i] in (
                        "accuracy",
                        "precision",
                        "recall",
                        "precision_recall",
                        "specificity",
                        "f1_score",
                        "fbeta_score",
                    ):
                        config.metrics_params[i].sub_params_list[j]["top_k"] = mp.get("top_k", 1)

        if self.custom_optimizer is not None:
            config.optimizer = str(self.custom_optimizer.__class__.__name__)
        if len(self.custom_optimizer_params) > 0:
            config.optimizer_params = self.custom_optimizer_params
        self.save_hyperparameters(config)
        # The concatenated output dim of the embedding layer
        self._build_network()
        self._setup_loss()
        self._setup_metrics()
        self._check_and_verify()
        self.do_log_logits = (
            hasattr(self.hparams, "log_logits") and self.hparams.log_logits and self.hparams.log_target == "wandb"
        )
        if self.do_log_logits:
            self._val_logits = []
        if not WANDB_INSTALLED and self.do_log_logits:
            self.do_log_logits = False
            warnings.warn(
                "Wandb is not installed. Please install wandb to log logits. "
                "You can install wandb using pip install wandb or install PyTorch Tabular"
                " using pip install pytorch-tabular[extra]"
            )
        if not PLOTLY_INSTALLED and self.do_log_logits:
            self.do_log_logits = False
            warnings.warn(
                "Plotly is not installed. Please install plotly to log logits. "
                "You can install plotly using pip install plotly or install PyTorch Tabular"
                " using pip install pytorch-tabular[extra]"
            )

    @abstractmethod
    def _build_network(self):
        pass

    @property
    def backbone(self):
        raise NotImplementedError("backbone property needs to be implemented by inheriting classes")

    @property
    def embedding_layer(self):
        raise NotImplementedError("embedding_layer property needs to be implemented by inheriting classes")

    @property
    def head(self):
        raise NotImplementedError("head property needs to be implemented by inheriting classes")

    def _check_and_verify(self):
        assert hasattr(self, "backbone"), "Model has no attribute called `backbone`"
        assert hasattr(self.backbone, "output_dim"), "Backbone needs to have attribute `output_dim`"
        assert hasattr(self, "head"), "Model has no attribute called `head`"

    def _get_head_from_config(self):
        _head_callable = getattr(blocks, self.hparams.head)
        return _head_callable(
            in_units=self.backbone.output_dim,
            output_dim=self.hparams.output_dim,
            config=_head_callable._config_template(**self.hparams.head_config),
        )  # output_dim auto-calculated from other configs

    def _setup_loss(self):
        if self.custom_loss is None:
            try:
                self.loss = getattr(nn, self.hparams.loss)()
            except AttributeError as e:
                logger.error(f"{self.hparams.loss} is not a valid loss defined in the torch.nn module")
                raise e
        else:
            self.loss = self.custom_loss

    def _setup_metrics(self):
        if self.custom_metrics is None:
            self.metrics = []
            task_module = torchmetrics.functional
            for metric in self.hparams.metrics:
                try:
                    self.metrics.append(getattr(task_module, metric))
                except AttributeError as e:
                    logger.error(
                        f"{metric} is not a valid functional metric defined in the torchmetrics.functional module"
                    )
                    raise e
        else:
            self.metrics = self.custom_metrics

    def calculate_loss(self, output: Dict, y: torch.Tensor, tag: str, sync_dist: bool = False) -> torch.Tensor:
        """Calculates the loss for the model.

        Args:
            output (Dict): The output dictionary from the model
            y (torch.Tensor): The target tensor
            tag (str): The tag to use for logging
            sync_dist (bool): enable distributed sync of logs

        Returns:
            torch.Tensor: The loss value

        """
        y_hat = output["logits"]
        reg_terms = [k for k, v in output.items() if "regularization" in k]
        reg_loss = 0
        for t in reg_terms:
            # Log only if non-zero
            if output[t] != 0:
                reg_loss += output[t]
                self.log(
                    f"{tag}_{t}_loss",
                    output[t],
                    on_epoch=True,
                    on_step=False,
                    logger=True,
                    prog_bar=False,
                    sync_dist=sync_dist,
                )
        if self.hparams.task == "regression":
            computed_loss = reg_loss
            for i in range(self.hparams.output_dim):
                _loss = self.loss(y_hat[:, i], y[:, i])
                computed_loss += _loss
                if self.hparams.output_dim > 1:
                    self.log(
                        f"{tag}_loss_{i}",
                        _loss,
                        on_epoch=True,
                        on_step=False,
                        logger=True,
                        prog_bar=False,
                        sync_dist=sync_dist,
                    )
        else:
            # TODO loss fails with batch size of 1?
            computed_loss = reg_loss
            start_index = 0
            for i in range(len(self.hparams.output_cardinality)):
                end_index = start_index + self.hparams.output_cardinality[i]
                _loss = self.loss(y_hat[:, start_index:end_index], y[:, i])
                computed_loss += _loss
                if self.hparams.output_dim > 1:
                    self.log(
                        f"{tag}_loss_{i}",
                        _loss,
                        on_epoch=True,
                        on_step=False,
                        logger=True,
                        prog_bar=False,
                        sync_dist=sync_dist,
                    )
                start_index = end_index
        self.log(
            f"{tag}_loss",
            computed_loss,
            on_epoch=(tag in ["valid", "test"]),
            on_step=(tag == "train"),
            # on_step=False,
            logger=True,
            prog_bar=True,
            sync_dist=sync_dist,
        )
        return computed_loss

    def calculate_metrics(
        self, y: torch.Tensor, y_hat: torch.Tensor, tag: str, sync_dist: bool = False
    ) -> List[torch.Tensor]:
        """Calculates the metrics for the model.

        Args:
            y (torch.Tensor): The target tensor

            y_hat (torch.Tensor): The predicted tensor

            tag (str): The tag to use for logging

            sync_dist (bool): enable distributed sync of logs

        Returns:
            List[torch.Tensor]: The list of metric values

        """
        metrics = []
        for metric, metric_str, prob_inp, metric_params in zip(
            self.metrics,
            self.hparams.metrics,
            self.hparams.metrics_prob_input,
            self.hparams.metrics_params,
        ):
            if self.hparams.task == "regression":
                _metrics = []
                for i in range(self.hparams.output_dim):
                    name = metric.func.__name__ if isinstance(metric, partial) else metric.__name__
                    if name == torchmetrics.functional.mean_squared_log_error.__name__:
                        # MSLE should only be used in strictly positive targets. It is undefined otherwise
                        _metric = metric(
                            torch.clamp(y_hat[:, i], min=0),
                            torch.clamp(y[:, i], min=0),
                            **metric_params,
                        )
                    else:
                        _metric = metric(y_hat[:, i], y[:, i], **metric_params)
                    if self.hparams.output_dim > 1:
                        self.log(
                            f"{tag}_{metric_str}_{i}",
                            _metric,
                            on_epoch=True,
                            on_step=False,
                            logger=True,
                            prog_bar=False,
                            sync_dist=sync_dist,
                        )
                    _metrics.append(_metric)
                avg_metric = torch.stack(_metrics, dim=0).sum()
            else:
                _metrics = []
                start_index = 0
                for i, cardinality in enumerate(self.hparams.output_cardinality):
                    end_index = start_index + cardinality
                    y_hat_i = nn.Softmax(dim=-1)(y_hat[:, start_index:end_index].squeeze())
                    if prob_inp:
                        _metric = metric(y_hat_i, y[:, i : i + 1].squeeze(), **metric_params.sub_params_list[i])
                    else:
                        _metric = metric(
                            torch.argmax(y_hat_i, dim=-1), y[:, i : i + 1].squeeze(), **metric_params.sub_params_list[i]
                        )
                    if len(self.hparams.output_cardinality) > 1:
                        self.log(
                            f"{tag}_{metric_str}_{i}",
                            _metric,
                            on_epoch=True,
                            on_step=False,
                            logger=True,
                            prog_bar=False,
                            sync_dist=sync_dist,
                        )
                    _metrics.append(_metric)
                    start_index = end_index
                avg_metric = torch.stack(_metrics, dim=0).sum()
            metrics.append(avg_metric)
            self.log(
                f"{tag}_{metric_str}",
                avg_metric,
                on_epoch=True,
                on_step=False,
                logger=True,
                prog_bar=True,
                sync_dist=sync_dist,
            )
        return metrics

    def data_aware_initialization(self, datamodule):
        """Performs data-aware initialization of the model when defined."""
        pass

    def compute_backbone(self, x: Dict) -> torch.Tensor:
        # Returns output
        x = self.backbone(x)
        return x

    def embed_input(self, x: Dict) -> torch.Tensor:
        return self.embedding_layer(x)

    def apply_output_sigmoid_scaling(self, y_hat: torch.Tensor) -> torch.Tensor:
        """Applies sigmoid scaling to the output of the model if the task is regression and the target range is
        defined.

        Args:
            y_hat (torch.Tensor): The output of the model

        Returns:
            torch.Tensor: The output of the model with sigmoid scaling applied

        """
        if (self.hparams.task == "regression") and (self.hparams.target_range is not None):
            for i in range(self.hparams.output_dim):
                y_min, y_max = self.hparams.target_range[i]
                y_hat[:, i] = y_min + nn.Sigmoid()(y_hat[:, i]) * (y_max - y_min)
        return y_hat

    def pack_output(self, y_hat: torch.Tensor, backbone_features: torch.tensor) -> Dict[str, Any]:
        """Packs the output of the model.

        Args:
            y_hat (torch.Tensor): The output of the model

            backbone_features (torch.tensor): The backbone features

        Returns:
            The packed output of the model

        """
        # if self.head is the Identity function it means that we cannot extract backbone features,
        # because the model cannot be divide in backbone and head (i.e. TabNet)
        if type(self.head) is nn.Identity:
            return {"logits": y_hat}
        return {"logits": y_hat, "backbone_features": backbone_features}

    def compute_head(self, backbone_features: Tensor) -> Dict[str, Any]:
        """Computes the head of the model.

        Args:
            backbone_features (Tensor): The backbone features

        Returns:
            The output of the model

        """
        y_hat = self.head(backbone_features)
        y_hat = self.apply_output_sigmoid_scaling(y_hat)
        return self.pack_output(y_hat, backbone_features)

    def forward(self, x: Dict) -> Dict[str, Any]:
        """The forward pass of the model.

        Args:
            x (Dict): The input of the model with 'continuous' and 'categorical' keys

        """
        x = self.embed_input(x)
        x = self.compute_backbone(x)
        return self.compute_head(x)

    def predict(self, x: Dict, ret_model_output: bool = False) -> Union[torch.Tensor, Tuple[torch.Tensor, Dict]]:
        """Predicts the output of the model.

        Args:
            x (Dict): The input of the model with 'continuous' and 'categorical' keys

            ret_model_output (bool): If True, the method returns the output of the model

        Returns:
            The output of the model

        """
        assert self.hparams.task != "ssl", "It's not allowed to use the method predict in case of ssl task"
        ret_value = self.forward(x)
        if ret_model_output:
            return ret_value.get("logits"), ret_value
        return ret_value.get("logits")

    def forward_pass(self, batch):
        return self(batch), None

    def extract_embedding(self):
        """Extracts the embedding of the model.

        This is used in `CategoricalEmbeddingTransformer`

        """
        if self.hparams.categorical_dim > 0:
            if not isinstance(self.embedding_layer, PreEncoded1dLayer):
                return self.embedding_layer.cat_embedding_layers
            else:
                raise ValueError(
                    "Cannot extract embedding for PreEncoded1dLayer. Please use a different embedding layer."
                )
        else:
            raise ValueError(
                "Model has been trained with no categorical feature and therefore can't be used"
                " as a Categorical Encoder"
            )

    def training_step(self, batch, batch_idx):
        output, y = self.forward_pass(batch)
        # y is not None for SSL task.Rest of the tasks target is
        # fetched from the batch
        y = batch["target"] if y is None else y
        y_hat = output["logits"]
        loss = self.calculate_loss(output, y, tag="train")
        self.calculate_metrics(y, y_hat, tag="train")
        return loss

    def validation_step(self, batch, batch_idx):
        with torch.no_grad():
            output, y = self.forward_pass(batch)
            # y is not None for SSL task.Rest of the tasks target is
            # fetched from the batch
            y = batch["target"] if y is None else y
            y_hat = output["logits"]
            self.calculate_loss(output, y, tag="valid", sync_dist=True)
            self.calculate_metrics(y, y_hat, tag="valid", sync_dist=True)
        return y_hat, y

    def test_step(self, batch, batch_idx):
        with torch.no_grad():
            output, y = self.forward_pass(batch)
            # y is not None for SSL task. Rest of the tasks target is
            # fetched from the batch
            y = batch["target"] if y is None else y
            y_hat = output["logits"]
            self.calculate_loss(output, y, tag="test", sync_dist=True)
            self.calculate_metrics(y, y_hat, tag="test", sync_dist=True)
        return y_hat, y

    def configure_optimizers(self):
        if self.custom_optimizer is None:
            # Loading from the config
            try:
                self._optimizer = _create_optimizer(self.hparams.optimizer)
                opt = self._optimizer(
                    self.parameters(),
                    lr=self.hparams.learning_rate,
                    **self.hparams.optimizer_params,
                )
            except AttributeError as e:
                logger.error(f"{self.hparams.optimizer} is not a valid optimizer defined in the torch.optim module")
                raise e
        else:
            # Loading from custom fit arguments
            self._optimizer = _create_optimizer(self.custom_optimizer)

            opt = self._optimizer(
                self.parameters(),
                lr=self.hparams.learning_rate,
                **self.custom_optimizer_params,
            )
        if self.hparams.lr_scheduler is not None:
            try:
                self._lr_scheduler = getattr(torch.optim.lr_scheduler, self.hparams.lr_scheduler)
            except AttributeError as e:
                logger.error(
                    f"{self.hparams.lr_scheduler} is not a valid learning rate sheduler defined"
                    f" in the torch.optim.lr_scheduler module"
                )
                raise e
            if isinstance(self._lr_scheduler, torch.optim.lr_scheduler._LRScheduler):
                return {
                    "optimizer": opt,
                    "lr_scheduler": self._lr_scheduler(opt, **self.hparams.lr_scheduler_params),
                }
            return {
                "optimizer": opt,
                "lr_scheduler": {
                    "scheduler": self._lr_scheduler(opt, **self.hparams.lr_scheduler_params),
                    "monitor": self.hparams.lr_scheduler_monitor_metric,
                    "interval": self.hparams.lr_scheduler_interval,
                },
            }
        else:
            return opt

    def create_plotly_histogram(self, arr, name, bin_dict=None):
        fig = go.Figure()
        for i in range(arr.shape[-1]):
            fig.add_trace(
                go.Histogram(
                    x=arr[:, i],
                    histnorm="probability",
                    name=f"{name}_{i}",
                    xbins=bin_dict,  # dict(start=0.0, end=1.0, size=0.1),  # bins used for histogram
                )
            )
        # Overlay both histograms
        fig.update_layout(
            barmode="overlay",
            legend={"orientation": "h", "yanchor": "bottom", "y": 1.02, "xanchor": "right", "x": 1},
        )
        # Reduce opacity to see both histograms
        fig.update_traces(opacity=0.5)
        return fig

    def on_validation_batch_end(self, outputs, batch, batch_idx: int) -> None:
        if self.do_log_logits:
            self._val_logits.append(outputs[0][0])
        super().on_validation_batch_end(outputs, batch, batch_idx)

    def on_validation_epoch_end(self) -> None:
        if self.do_log_logits:
            logits = torch.cat(self._val_logits).detach().cpu()
            self._val_logits = []
            fig = self.create_plotly_histogram(logits, "logits")
            wandb.log(
                {"valid_logits": wandb.Plotly(fig), "global_step": self.global_step},
                commit=False,
            )
        super().on_validation_epoch_end()

    def reset_weights(self):
        reset_all_weights(self.backbone)
        reset_all_weights(self.head)
        reset_all_weights(self.embedding_layer)

    def feature_importance(self) -> DataFrame:
        """Returns a dataframe with feature importance for the model."""
        if hasattr(self.backbone, "feature_importance_"):
            imp = self.backbone.feature_importance_
            n_feat = len(self.hparams.categorical_cols + self.hparams.continuous_cols)
            if self.hparams.categorical_dim > 0:
                if imp.shape[0] != n_feat:
                    # Combining Cat Embedded Dimensions to a single one by averaging
                    wt = []
                    norm = []
                    ft_idx = 0
                    for _, embd_dim in self.hparams.embedding_dims:
                        wt.extend([ft_idx] * embd_dim)
                        norm.append(embd_dim)
                        ft_idx += 1
                    for _ in self.hparams.continuous_cols:
                        wt.extend([ft_idx])
                        norm.append(1)
                        ft_idx += 1
                    imp = np.bincount(wt, weights=imp) / np.array(norm)
                else:
                    # For models like FTTransformer, we dont need to do anything
                    # It takes categorical and continuous as individual 2-D features
                    pass
            importance_df = DataFrame(
                {
                    "Features": self.hparams.categorical_cols + self.hparams.continuous_cols,
                    "importance": imp,
                }
            )
            return importance_df
        else:
            raise ValueError("Feature Importance unavailable for this model.")

`init(config, custom_loss=None, custom_metrics=None, custom_metrics_prob_inputs=None, custom_optimizer=None, custom_optimizer_params={}, **kwargs)` ¶

Base Model for PyTorch Tabular.

Parameters:

Name	Type	Description	Default
`config`	`DictConfig`	The configuration for the model.	required
`custom_loss`	`Optional[Module]`	A custom loss function. Defaults to None.	`None`
`custom_metrics`	`Optional[List[Callable]]`	A list of custom metrics. Defaults to None.	`None`
`custom_metrics_prob_inputs`	`Optional[List[bool]]`	A list of boolean values indicating whether the metric requires probability inputs. Defaults to None.	`None`
`custom_optimizer`	`Optional[Optimizer]`	A custom optimizer as callable or string to be imported. Defaults to None.	`None`
`custom_optimizer_params`	`Dict`	A dictionary of custom optimizer parameters. Defaults to {}.	`{}`
`kwargs`	`Dict`	Additional keyword arguments.	`{}`

Source code in src/pytorch_tabular/models/base_model.py

def __init__(
    self,
    config: DictConfig,
    custom_loss: Optional[torch.nn.Module] = None,
    custom_metrics: Optional[List[Callable]] = None,
    custom_metrics_prob_inputs: Optional[List[bool]] = None,
    custom_optimizer: Optional[torch.optim.Optimizer] = None,
    custom_optimizer_params: Dict = {},
    **kwargs,
):
    """Base Model for PyTorch Tabular.

    Args:
        config (DictConfig): The configuration for the model.
        custom_loss (Optional[torch.nn.Module], optional): A custom loss function. Defaults to None.
        custom_metrics (Optional[List[Callable]], optional): A list of custom metrics. Defaults to None.
        custom_metrics_prob_inputs (Optional[List[bool]], optional): A list of boolean values indicating whether the
            metric requires probability inputs. Defaults to None.
        custom_optimizer (Optional[torch.optim.Optimizer], optional):
            A custom optimizer as callable or string to be imported. Defaults to None.
        custom_optimizer_params (Dict, optional): A dictionary of custom optimizer parameters. Defaults to {}.
        kwargs (Dict, optional): Additional keyword arguments.

    """
    super().__init__()
    assert "inferred_config" in kwargs, "inferred_config not found in initialization arguments"
    inferred_config = kwargs["inferred_config"]
    # Merging the config and inferred config
    config = safe_merge_config(config, inferred_config)
    self.custom_loss = custom_loss
    self.custom_metrics = custom_metrics
    self.custom_metrics_prob_inputs = custom_metrics_prob_inputs
    self.custom_optimizer = custom_optimizer
    self.custom_optimizer_params = custom_optimizer_params
    self.kwargs = kwargs
    # Updating config with custom parameters for experiment tracking
    if self.custom_loss is not None:
        config.loss = str(self.custom_loss)
    if self.custom_metrics is not None:
        # Adding metrics to config for hparams logging and tracking
        config.metrics = []
        config.metrics_params = []
        for metric in self.custom_metrics:
            if isinstance(metric, partial):
                # extracting func names from partial functions
                config.metrics.append(metric.func.__name__)
                config.metrics_params.append(metric.keywords)
            else:
                config.metrics.append(metric.__name__)
                config.metrics_params.append(vars(metric))
        if config.task == "classification":
            config.metrics_prob_input = self.custom_metrics_prob_inputs
            for i, mp in enumerate(config.metrics_params):
                mp.sub_params_list = []
                for j, num_classes in enumerate(inferred_config.output_cardinality):
                    config.metrics_params[i].sub_params_list.append(
                        OmegaConf.create(
                            {
                                "task": mp.get("task", "multiclass"),
                                "num_classes": mp.get("num_classes", num_classes),
                            }
                        )
                    )

    # Updating default metrics in config
    elif config.task == "classification":
        # Adding metric_params to config for classification task
        for i, mp in enumerate(config.metrics_params):
            mp.sub_params_list = []
            for j, num_classes in enumerate(inferred_config.output_cardinality):
                # config.metrics_params[i][j]["task"] = mp.get("task", "multiclass")
                # config.metrics_params[i][j]["num_classes"] = mp.get("num_classes", num_classes)

                config.metrics_params[i].sub_params_list.append(
                    OmegaConf.create(
                        {"task": mp.get("task", "multiclass"), "num_classes": mp.get("num_classes", num_classes)}
                    )
                )

                if config.metrics[i] in (
                    "accuracy",
                    "precision",
                    "recall",
                    "precision_recall",
                    "specificity",
                    "f1_score",
                    "fbeta_score",
                ):
                    config.metrics_params[i].sub_params_list[j]["top_k"] = mp.get("top_k", 1)

    if self.custom_optimizer is not None:
        config.optimizer = str(self.custom_optimizer.__class__.__name__)
    if len(self.custom_optimizer_params) > 0:
        config.optimizer_params = self.custom_optimizer_params
    self.save_hyperparameters(config)
    # The concatenated output dim of the embedding layer
    self._build_network()
    self._setup_loss()
    self._setup_metrics()
    self._check_and_verify()
    self.do_log_logits = (
        hasattr(self.hparams, "log_logits") and self.hparams.log_logits and self.hparams.log_target == "wandb"
    )
    if self.do_log_logits:
        self._val_logits = []
    if not WANDB_INSTALLED and self.do_log_logits:
        self.do_log_logits = False
        warnings.warn(
            "Wandb is not installed. Please install wandb to log logits. "
            "You can install wandb using pip install wandb or install PyTorch Tabular"
            " using pip install pytorch-tabular[extra]"
        )
    if not PLOTLY_INSTALLED and self.do_log_logits:
        self.do_log_logits = False
        warnings.warn(
            "Plotly is not installed. Please install plotly to log logits. "
            "You can install plotly using pip install plotly or install PyTorch Tabular"
            " using pip install pytorch-tabular[extra]"
        )

`apply_output_sigmoid_scaling(y_hat)` ¶

Applies sigmoid scaling to the output of the model if the task is regression and the target range is defined.

Parameters:

Name	Type	Description	Default
`y_hat`	`Tensor`	The output of the model	required

Returns:

Type	Description
`Tensor`	torch.Tensor: The output of the model with sigmoid scaling applied

Source code in src/pytorch_tabular/models/base_model.py

def apply_output_sigmoid_scaling(self, y_hat: torch.Tensor) -> torch.Tensor:
    """Applies sigmoid scaling to the output of the model if the task is regression and the target range is
    defined.

    Args:
        y_hat (torch.Tensor): The output of the model

    Returns:
        torch.Tensor: The output of the model with sigmoid scaling applied

    """
    if (self.hparams.task == "regression") and (self.hparams.target_range is not None):
        for i in range(self.hparams.output_dim):
            y_min, y_max = self.hparams.target_range[i]
            y_hat[:, i] = y_min + nn.Sigmoid()(y_hat[:, i]) * (y_max - y_min)
    return y_hat

`calculate_loss(output, y, tag, sync_dist=False)` ¶

Calculates the loss for the model.

Parameters:

Name	Type	Description	Default
`output`	`Dict`	The output dictionary from the model	required
`y`	`Tensor`	The target tensor	required
`tag`	`str`	The tag to use for logging	required
`sync_dist`	`bool`	enable distributed sync of logs	`False`

Returns:

Type	Description
`Tensor`	torch.Tensor: The loss value

Source code in src/pytorch_tabular/models/base_model.py

def calculate_loss(self, output: Dict, y: torch.Tensor, tag: str, sync_dist: bool = False) -> torch.Tensor:
    """Calculates the loss for the model.

    Args:
        output (Dict): The output dictionary from the model
        y (torch.Tensor): The target tensor
        tag (str): The tag to use for logging
        sync_dist (bool): enable distributed sync of logs

    Returns:
        torch.Tensor: The loss value

    """
    y_hat = output["logits"]
    reg_terms = [k for k, v in output.items() if "regularization" in k]
    reg_loss = 0
    for t in reg_terms:
        # Log only if non-zero
        if output[t] != 0:
            reg_loss += output[t]
            self.log(
                f"{tag}_{t}_loss",
                output[t],
                on_epoch=True,
                on_step=False,
                logger=True,
                prog_bar=False,
                sync_dist=sync_dist,
            )
    if self.hparams.task == "regression":
        computed_loss = reg_loss
        for i in range(self.hparams.output_dim):
            _loss = self.loss(y_hat[:, i], y[:, i])
            computed_loss += _loss
            if self.hparams.output_dim > 1:
                self.log(
                    f"{tag}_loss_{i}",
                    _loss,
                    on_epoch=True,
                    on_step=False,
                    logger=True,
                    prog_bar=False,
                    sync_dist=sync_dist,
                )
    else:
        # TODO loss fails with batch size of 1?
        computed_loss = reg_loss
        start_index = 0
        for i in range(len(self.hparams.output_cardinality)):
            end_index = start_index + self.hparams.output_cardinality[i]
            _loss = self.loss(y_hat[:, start_index:end_index], y[:, i])
            computed_loss += _loss
            if self.hparams.output_dim > 1:
                self.log(
                    f"{tag}_loss_{i}",
                    _loss,
                    on_epoch=True,
                    on_step=False,
                    logger=True,
                    prog_bar=False,
                    sync_dist=sync_dist,
                )
            start_index = end_index
    self.log(
        f"{tag}_loss",
        computed_loss,
        on_epoch=(tag in ["valid", "test"]),
        on_step=(tag == "train"),
        # on_step=False,
        logger=True,
        prog_bar=True,
        sync_dist=sync_dist,
    )
    return computed_loss

`calculate_metrics(y, y_hat, tag, sync_dist=False)` ¶

Calculates the metrics for the model.

Parameters:

Name	Type	Description	Default
`y`	`Tensor`	The target tensor	required
`y_hat`	`Tensor`	The predicted tensor	required
`tag`	`str`	The tag to use for logging	required
`sync_dist`	`bool`	enable distributed sync of logs	`False`

Returns:

Type	Description
`List[Tensor]`	List[torch.Tensor]: The list of metric values

Source code in src/pytorch_tabular/models/base_model.py

def calculate_metrics(
    self, y: torch.Tensor, y_hat: torch.Tensor, tag: str, sync_dist: bool = False
) -> List[torch.Tensor]:
    """Calculates the metrics for the model.

    Args:
        y (torch.Tensor): The target tensor

        y_hat (torch.Tensor): The predicted tensor

        tag (str): The tag to use for logging

        sync_dist (bool): enable distributed sync of logs

    Returns:
        List[torch.Tensor]: The list of metric values

    """
    metrics = []
    for metric, metric_str, prob_inp, metric_params in zip(
        self.metrics,
        self.hparams.metrics,
        self.hparams.metrics_prob_input,
        self.hparams.metrics_params,
    ):
        if self.hparams.task == "regression":
            _metrics = []
            for i in range(self.hparams.output_dim):
                name = metric.func.__name__ if isinstance(metric, partial) else metric.__name__
                if name == torchmetrics.functional.mean_squared_log_error.__name__:
                    # MSLE should only be used in strictly positive targets. It is undefined otherwise
                    _metric = metric(
                        torch.clamp(y_hat[:, i], min=0),
                        torch.clamp(y[:, i], min=0),
                        **metric_params,
                    )
                else:
                    _metric = metric(y_hat[:, i], y[:, i], **metric_params)
                if self.hparams.output_dim > 1:
                    self.log(
                        f"{tag}_{metric_str}_{i}",
                        _metric,
                        on_epoch=True,
                        on_step=False,
                        logger=True,
                        prog_bar=False,
                        sync_dist=sync_dist,
                    )
                _metrics.append(_metric)
            avg_metric = torch.stack(_metrics, dim=0).sum()
        else:
            _metrics = []
            start_index = 0
            for i, cardinality in enumerate(self.hparams.output_cardinality):
                end_index = start_index + cardinality
                y_hat_i = nn.Softmax(dim=-1)(y_hat[:, start_index:end_index].squeeze())
                if prob_inp:
                    _metric = metric(y_hat_i, y[:, i : i + 1].squeeze(), **metric_params.sub_params_list[i])
                else:
                    _metric = metric(
                        torch.argmax(y_hat_i, dim=-1), y[:, i : i + 1].squeeze(), **metric_params.sub_params_list[i]
                    )
                if len(self.hparams.output_cardinality) > 1:
                    self.log(
                        f"{tag}_{metric_str}_{i}",
                        _metric,
                        on_epoch=True,
                        on_step=False,
                        logger=True,
                        prog_bar=False,
                        sync_dist=sync_dist,
                    )
                _metrics.append(_metric)
                start_index = end_index
            avg_metric = torch.stack(_metrics, dim=0).sum()
        metrics.append(avg_metric)
        self.log(
            f"{tag}_{metric_str}",
            avg_metric,
            on_epoch=True,
            on_step=False,
            logger=True,
            prog_bar=True,
            sync_dist=sync_dist,
        )
    return metrics

`compute_head(backbone_features)` ¶

Computes the head of the model.

Parameters:

Name	Type	Description	Default
`backbone_features`	`Tensor`	The backbone features	required

Returns:

Type	Description
`Dict[str, Any]`	The output of the model

Source code in src/pytorch_tabular/models/base_model.py

def compute_head(self, backbone_features: Tensor) -> Dict[str, Any]:
    """Computes the head of the model.

    Args:
        backbone_features (Tensor): The backbone features

    Returns:
        The output of the model

    """
    y_hat = self.head(backbone_features)
    y_hat = self.apply_output_sigmoid_scaling(y_hat)
    return self.pack_output(y_hat, backbone_features)

`data_aware_initialization(datamodule)` ¶

Performs data-aware initialization of the model when defined.

Source code in src/pytorch_tabular/models/base_model.py

def data_aware_initialization(self, datamodule):
    """Performs data-aware initialization of the model when defined."""
    pass

`extract_embedding()` ¶

Extracts the embedding of the model.

This is used in CategoricalEmbeddingTransformer

Source code in src/pytorch_tabular/models/base_model.py

def extract_embedding(self):
    """Extracts the embedding of the model.

    This is used in `CategoricalEmbeddingTransformer`

    """
    if self.hparams.categorical_dim > 0:
        if not isinstance(self.embedding_layer, PreEncoded1dLayer):
            return self.embedding_layer.cat_embedding_layers
        else:
            raise ValueError(
                "Cannot extract embedding for PreEncoded1dLayer. Please use a different embedding layer."
            )
    else:
        raise ValueError(
            "Model has been trained with no categorical feature and therefore can't be used"
            " as a Categorical Encoder"
        )

`feature_importance()` ¶

Returns a dataframe with feature importance for the model.

Source code in src/pytorch_tabular/models/base_model.py

def feature_importance(self) -> DataFrame:
    """Returns a dataframe with feature importance for the model."""
    if hasattr(self.backbone, "feature_importance_"):
        imp = self.backbone.feature_importance_
        n_feat = len(self.hparams.categorical_cols + self.hparams.continuous_cols)
        if self.hparams.categorical_dim > 0:
            if imp.shape[0] != n_feat:
                # Combining Cat Embedded Dimensions to a single one by averaging
                wt = []
                norm = []
                ft_idx = 0
                for _, embd_dim in self.hparams.embedding_dims:
                    wt.extend([ft_idx] * embd_dim)
                    norm.append(embd_dim)
                    ft_idx += 1
                for _ in self.hparams.continuous_cols:
                    wt.extend([ft_idx])
                    norm.append(1)
                    ft_idx += 1
                imp = np.bincount(wt, weights=imp) / np.array(norm)
            else:
                # For models like FTTransformer, we dont need to do anything
                # It takes categorical and continuous as individual 2-D features
                pass
        importance_df = DataFrame(
            {
                "Features": self.hparams.categorical_cols + self.hparams.continuous_cols,
                "importance": imp,
            }
        )
        return importance_df
    else:
        raise ValueError("Feature Importance unavailable for this model.")

`forward(x)` ¶

The forward pass of the model.

Parameters:

Name	Type	Description	Default
`x`	`Dict`	The input of the model with 'continuous' and 'categorical' keys	required

Source code in src/pytorch_tabular/models/base_model.py

def forward(self, x: Dict) -> Dict[str, Any]:
    """The forward pass of the model.

    Args:
        x (Dict): The input of the model with 'continuous' and 'categorical' keys

    """
    x = self.embed_input(x)
    x = self.compute_backbone(x)
    return self.compute_head(x)

`pack_output(y_hat, backbone_features)` ¶

Packs the output of the model.

Parameters:

Name	Type	Description	Default
`y_hat`	`Tensor`	The output of the model	required
`backbone_features`	`tensor`	The backbone features	required

Returns:

Type	Description
`Dict[str, Any]`	The packed output of the model

Source code in src/pytorch_tabular/models/base_model.py

def pack_output(self, y_hat: torch.Tensor, backbone_features: torch.tensor) -> Dict[str, Any]:
    """Packs the output of the model.

    Args:
        y_hat (torch.Tensor): The output of the model

        backbone_features (torch.tensor): The backbone features

    Returns:
        The packed output of the model

    """
    # if self.head is the Identity function it means that we cannot extract backbone features,
    # because the model cannot be divide in backbone and head (i.e. TabNet)
    if type(self.head) is nn.Identity:
        return {"logits": y_hat}
    return {"logits": y_hat, "backbone_features": backbone_features}

`predict(x, ret_model_output=False)` ¶

Predicts the output of the model.

Parameters:

Name	Type	Description	Default
`x`	`Dict`	The input of the model with 'continuous' and 'categorical' keys	required
`ret_model_output`	`bool`	If True, the method returns the output of the model	`False`

Returns:

Type	Description
`Union[Tensor, Tuple[Tensor, Dict]]`	The output of the model

Source code in src/pytorch_tabular/models/base_model.py

def predict(self, x: Dict, ret_model_output: bool = False) -> Union[torch.Tensor, Tuple[torch.Tensor, Dict]]:
    """Predicts the output of the model.

    Args:
        x (Dict): The input of the model with 'continuous' and 'categorical' keys

        ret_model_output (bool): If True, the method returns the output of the model

    Returns:
        The output of the model

    """
    assert self.hparams.task != "ssl", "It's not allowed to use the method predict in case of ssl task"
    ret_value = self.forward(x)
    if ret_model_output:
        return ret_value.get("logits"), ret_value
    return ret_value.get("logits")

Supervised Models

Configuration Classes¶

Model Classes¶

data_aware_initialization(datamodule) ¶

Base Model Class¶

__init__(config, custom_loss=None, custom_metrics=None, custom_metrics_prob_inputs=None, custom_optimizer=None, custom_optimizer_params={}, **kwargs) ¶

apply_output_sigmoid_scaling(y_hat) ¶

calculate_loss(output, y, tag, sync_dist=False) ¶

calculate_metrics(y, y_hat, tag, sync_dist=False) ¶

compute_head(backbone_features) ¶

data_aware_initialization(datamodule) ¶

extract_embedding() ¶

feature_importance() ¶

forward(x) ¶

pack_output(y_hat, backbone_features) ¶

predict(x, ret_model_output=False) ¶

`data_aware_initialization(datamodule)` ¶

`init(config, custom_loss=None, custom_metrics=None, custom_metrics_prob_inputs=None, custom_optimizer=None, custom_optimizer_params={}, **kwargs)` ¶

`apply_output_sigmoid_scaling(y_hat)` ¶

`calculate_loss(output, y, tag, sync_dist=False)` ¶

`calculate_metrics(y, y_hat, tag, sync_dist=False)` ¶

`compute_head(backbone_features)` ¶

`data_aware_initialization(datamodule)` ¶

`extract_embedding()` ¶

`feature_importance()` ¶

`forward(x)` ¶

`pack_output(y_hat, backbone_features)` ¶

`predict(x, ret_model_output=False)` ¶