Skip to content

Supervised Models

Configuration Classes

Bases: ModelConfig

AutomaticFeatureInteraction configuration.

Parameters:

Name Type Description Default
attn_embed_dim int

The number of hidden units in the Multi-Headed Attention layers. Defaults to 32

32
num_heads int

The number of heads in the Multi-Headed Attention layer. Defaults to 2

2
num_attn_blocks int

The number of layers of stacked Multi-Headed Attention layers. Defaults to 3

3
attn_dropouts float

Dropout between layers of Multi-Headed Attention Layers. Defaults to 0.0

0.0
has_residuals bool

Flag to have a residual connect from embedded output to attention layer output. Defaults to True

True
embedding_dim int

The dimensions of the embedding for continuous and categorical columns. Defaults to 16

16
embedding_initialization Optional[str]

Initialization scheme for the embedding layers. Defaults to kaiming. Choices are: [kaiming_uniform,kaiming_normal].

'kaiming_uniform'
embedding_bias bool

Flag to turn on Embedding Bias. Defaults to True

True
share_embedding bool

The flag turns on shared embeddings in the input embedding process. The key idea here is to have an embedding for the feature as a whole along with embeddings of each unique values of that column. For more details refer to Appendix A of the TabTransformer paper. Defaults to False

False
share_embedding_strategy Optional[str]

There are two strategies in adding shared embeddings. 1. add - A separate embedding for the feature is added to the embedding of the unique values of the feature. 2. fraction - A fraction of the input embedding is reserved for the shared embedding of the feature. Defaults to fraction. Choices are: [add,fraction].

'fraction'
shared_embedding_fraction float

Fraction of the input_embed_dim to be reserved by the shared embedding. Should be less than one. Defaults to 0.25

0.25
deep_layers bool

Flag to enable a deep MLP layer before the Multi-Headed Attention layer. Defaults to False

False
layers str

Hyphen-separated number of layers and units in the deep MLP. Defaults to 128-64-32

'128-64-32'
activation str

The activation type in the deep MLP. The default activation in PyTorch like ReLU, TanH, LeakyReLU, etc. https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity. Defaults to ReLU

'ReLU'
use_batch_norm bool

Flag to include a BatchNorm layer after each Linear Layer+DropOut in the deep MLP. Defaults to False

False
initialization str

Initialization scheme for the linear layers in the deep MLP. Defaults to kaiming. Choices are: [kaiming,xavier,random].

'kaiming'
dropout float

Probability of an element to be zeroed in the deep MLP. Defaults to 0.0

0.0
attention_pooling bool

If True, will combine the attention outputs of each block for final prediction. Defaults to False

False
task str

Specify whether the problem is regression or classification. backbone is a task which considers the model as a backbone to generate features. Mostly used internally for SSL and related tasks. Choices are: [regression,classification,backbone].

required
head Optional[str]

The head to be used for the model. Should be one of the heads defined in pytorch_tabular.models.common.heads. Defaults to LinearHead. Choices are: [None,LinearHead,MixtureDensityHead].

'LinearHead'
head_config Optional[Dict]

The config as a dict which defines the head. If left empty, will be initialized as default linear head.

lambda: {'layers': ''}()
embedding_dims Optional[List]

The dimensions of the embedding for each categorical column as a list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of the categorical column using the rule min(50, (x + 1) // 2)

None
embedding_dropout float

Dropout to be applied to the Categorical Embedding. Defaults to 0.0

0.0
batch_norm_continuous_input bool

If True, we will normalize the continuous layer by passing it through a BatchNorm layer.

True
learning_rate float

The learning rate of the model. Defaults to 1e-3.

0.001
loss Optional[str]

The loss function to be applied. By Default, it is MSELoss for regression and CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss or L1Loss for regression and CrossEntropyLoss for classification

None
metrics Optional[List[str]]

the list of metrics you need to track during training. The metrics should be one of the functional metrics implemented in torchmetrics. By default, it is accuracy if classification and mean_squared_error for regression

None
metrics_params Optional[List]

The parameters to be passed to the metrics function

None
metrics_prob_input Optional[List]

Is a mandatory parameter for classification metrics defined in the config. This defines whether the input to the metric function is the probability or the class. Length should be same as the number of metrics. Defaults to None.

None
target_range Optional[List]

The range in which we should limit the output variable. Currently ignored for multi-target regression. Typically used for Regression problems. If left empty, will not apply any restrictions

None
seed int

The seed for reproducibility. Defaults to 42

42
Source code in src/pytorch_tabular/models/autoint/config.py
@dataclass
class AutoIntConfig(ModelConfig):
    """AutomaticFeatureInteraction configuration.

    Args:
        attn_embed_dim (int): The number of hidden units in the Multi-Headed Attention layers. Defaults to
                32

        num_heads (int): The number of heads in the Multi-Headed Attention layer. Defaults to 2

        num_attn_blocks (int): The number of layers of stacked Multi-Headed Attention layers. Defaults to 3

        attn_dropouts (float): Dropout between layers of Multi-Headed Attention Layers. Defaults to 0.0

        has_residuals (bool): Flag to have a residual connect from embedded output to attention layer
                output. Defaults to True

        embedding_dim (int): The dimensions of the embedding for continuous and categorical columns.
                Defaults to 16

        embedding_initialization (Optional[str]): Initialization scheme for the embedding layers. Defaults
                to `kaiming`. Choices are: [`kaiming_uniform`,`kaiming_normal`].

        embedding_bias (bool): Flag to turn on Embedding Bias. Defaults to True

        share_embedding (bool): The flag turns on shared embeddings in the input embedding process. The key
                idea here is to have an embedding for the feature as a whole along with embeddings of each unique
                values of that column. For more details refer to Appendix A of the TabTransformer paper. Defaults
                to False

        share_embedding_strategy (Optional[str]): There are two strategies in adding shared embeddings. 1.
                `add` - A separate embedding for the feature is added to the embedding of the unique values of the
                feature. 2. `fraction` - A fraction of the input embedding is reserved for the shared embedding of
                the feature. Defaults to fraction. Choices are: [`add`,`fraction`].

        shared_embedding_fraction (float): Fraction of the input_embed_dim to be reserved by the shared
                embedding. Should be less than one. Defaults to 0.25

        deep_layers (bool): Flag to enable a deep MLP layer before the Multi-Headed Attention layer.
                Defaults to False

        layers (str): Hyphen-separated number of layers and units in the deep MLP. Defaults to 128-64-32

        activation (str): The activation type in the deep MLP. The default activation in PyTorch like ReLU,
                TanH, LeakyReLU, etc.
                https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity.
                Defaults to ReLU

        use_batch_norm (bool): Flag to include a BatchNorm layer after each Linear Layer+DropOut in the
                deep MLP. Defaults to False

        initialization (str): Initialization scheme for the linear layers in the deep MLP. Defaults to
                `kaiming`. Choices are: [`kaiming`,`xavier`,`random`].

        dropout (float): Probability of an element to be zeroed in the deep MLP. Defaults to 0.0

        attention_pooling (bool): If True, will combine the attention outputs of each block for final
                prediction. Defaults to False

        task (str): Specify whether the problem is regression or classification. `backbone` is a task which
                considers the model as a backbone to generate features. Mostly used internally for SSL and related
                tasks. Choices are: [`regression`,`classification`,`backbone`].

        head (Optional[str]): The head to be used for the model. Should be one of the heads defined in
                `pytorch_tabular.models.common.heads`. Defaults to  LinearHead. Choices are:
                [`None`,`LinearHead`,`MixtureDensityHead`].

        head_config (Optional[Dict]): The config as a dict which defines the head. If left empty, will be
                initialized as default linear head.

        embedding_dims (Optional[List]): The dimensions of the embedding for each categorical column as a
                list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of
                the categorical column using the rule min(50, (x + 1) // 2)

        embedding_dropout (float): Dropout to be applied to the Categorical Embedding. Defaults to 0.0

        batch_norm_continuous_input (bool): If True, we will normalize the continuous layer by passing it
                through a BatchNorm layer.

        learning_rate (float): The learning rate of the model. Defaults to 1e-3.

        loss (Optional[str]): The loss function to be applied. By Default, it is MSELoss for regression and
                CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss
                or L1Loss for regression and CrossEntropyLoss for classification

        metrics (Optional[List[str]]): the list of metrics you need to track during training. The metrics
                should be one of the functional metrics implemented in ``torchmetrics``. By default, it is
                accuracy if classification and mean_squared_error for regression

        metrics_params (Optional[List]): The parameters to be passed to the metrics function

        metrics_prob_input (Optional[List]): Is a mandatory parameter for classification metrics defined in the config.
            This defines whether the input to the metric function is the probability or the class. Length should be
            same as the number of metrics. Defaults to None.

        target_range (Optional[List]): The range in which we should limit the output variable. Currently
                ignored for multi-target regression. Typically used for Regression problems. If left empty, will
                not apply any restrictions

        seed (int): The seed for reproducibility. Defaults to 42

    """

    attn_embed_dim: int = field(
        default=32,
        metadata={"help": "The number of hidden units in the Multi-Headed Attention layers. Defaults to 32"},
    )
    num_heads: int = field(
        default=2,
        metadata={"help": "The number of heads in the Multi-Headed Attention layer. Defaults to 2"},
    )
    num_attn_blocks: int = field(
        default=3,
        metadata={"help": "The number of layers of stacked Multi-Headed Attention layers. Defaults to 3"},
    )
    attn_dropouts: float = field(
        default=0.0,
        metadata={"help": "Dropout between layers of Multi-Headed Attention Layers. Defaults to 0.0"},
    )
    has_residuals: bool = field(
        default=True,
        metadata={
            "help": "Flag to have a residual connect from enbedded output to attention layer output. Defaults to True"
        },
    )
    embedding_dim: int = field(
        default=16,
        metadata={"help": "The dimensions of the embedding for continuous and categorical columns. Defaults to 16"},
    )
    embedding_initialization: Optional[str] = field(
        default="kaiming_uniform",
        metadata={
            "help": "Initialization scheme for the embedding layers. Defaults to `kaiming`",
            "choices": ["kaiming_uniform", "kaiming_normal"],
        },
    )
    embedding_bias: bool = field(
        default=True,
        metadata={"help": "Flag to turn on Embedding Bias. Defaults to True"},
    )
    share_embedding: bool = field(
        default=False,
        metadata={
            "help": "The flag turns on shared embeddings in the input embedding process."
            " The key idea here is to have an embedding for the feature as a whole along with embeddings"
            " of each unique values of that column."
            " For more details refer to Appendix A of the TabTransformer paper. Defaults to False"
        },
    )
    share_embedding_strategy: Optional[str] = field(
        default="fraction",
        metadata={
            "help": "There are two strategies in adding shared embeddings."
            " 1. `add` - A separate embedding for the feature is added to the embedding"
            " of the unique values of the feature."
            " 2. `fraction` - A fraction of the input embedding is reserved"
            " for the shared embedding of the feature. Defaults to fraction.",
            "choices": ["add", "fraction"],
        },
    )
    shared_embedding_fraction: float = field(
        default=0.25,
        metadata={
            "help": "Fraction of the input_embed_dim to be reserved by the shared embedding."
            " Should be less than one. Defaults to 0.25"
        },
    )
    deep_layers: bool = field(
        default=False,
        metadata={"help": "Flag to enable a deep MLP layer before the Multi-Headed Attention layer. Defaults to False"},
    )
    layers: str = field(
        default="128-64-32",
        metadata={"help": "Hyphen-separated number of layers and units in the deep MLP. Defaults to 128-64-32"},
    )
    activation: str = field(
        default="ReLU",
        metadata={
            "help": "The activation type in the deep MLP. The default activation in PyTorch"
            " like ReLU, TanH, LeakyReLU, etc."
            " https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity."
            " Defaults to ReLU"
        },
    )
    use_batch_norm: bool = field(
        default=False,
        metadata={
            "help": "Flag to include a BatchNorm layer after each Linear Layer+DropOut in the deep MLP."
            " Defaults to False"
        },
    )
    initialization: str = field(
        default="kaiming",
        metadata={
            "help": "Initialization scheme for the linear layers in the deep MLP. Defaults to `kaiming`",
            "choices": ["kaiming", "xavier", "random"],
        },
    )
    dropout: float = field(
        default=0.0,
        metadata={"help": "Probability of an element to be zeroed in the deep MLP. Defaults to 0.0"},
    )
    attention_pooling: bool = field(
        default=False,
        metadata={
            "help": "If True, will combine the attention outputs of each block for final prediction. Defaults to False"
        },
    )
    _module_src: str = field(default="models.autoint")
    _model_name: str = field(default="AutoIntModel")
    _backbone_name: str = field(default="AutoIntBackbone")
    _config_name: str = field(default="AutoIntConfig")

Bases: ModelConfig

CategoryEmbeddingModel configuration.

Parameters:

Name Type Description Default
layers str

DEPRECATED: Hyphen-separated number of layers and units in the classification head. E.g. 32-64-32. Defaults to 128-64-32

'128-64-32'
activation str

DEPRECATED: The activation type in the classification head. The default activation in PyTorch like ReLU, TanH, LeakyReLU, etc. https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity. Defaults to ReLU

'ReLU'
use_batch_norm bool

DEPRECATED: Flag to include a BatchNorm layer after each Linear Layer+DropOut. Defaults to False

False
initialization str

DEPRECATED: Initialization scheme for the linear layers. Defaults to kaiming. Choices are: [kaiming,xavier,random].

'kaiming'
dropout float

DEPRECATED: probability of a classification element to be zeroed. This is added to each linear layer. Defaults to 0.0

0.0
task str

Specify whether the problem is regression or classification. backbone is a task which considers the model as a backbone to generate features. Mostly used internally for SSL and related tasks. Choices are: [regression,classification,backbone].

required
head Optional[str]

The head to be used for the model. Should be one of the heads defined in pytorch_tabular.models.common.heads. Defaults to LinearHead. Choices are: [None,LinearHead,MixtureDensityHead].

'LinearHead'
head_config Optional[Dict]

The config as a dict which defines the head. If left empty, will be initialized as default linear head.

lambda: {'layers': ''}()
embedding_dims Optional[List]

The dimensions of the embedding for each categorical column as a list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of the categorical column using the rule min(50, (x + 1) // 2)

None
embedding_dropout float

Dropout to be applied to the Categorical Embedding. Defaults to 0.0

0.0
batch_norm_continuous_input bool

If True, we will normalize the continuous layer by passing it through a BatchNorm layer.

True
learning_rate float

The learning rate of the model. Defaults to 1e-3.

0.001
loss Optional[str]

The loss function to be applied. By Default, it is MSELoss for regression and CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss or L1Loss for regression and CrossEntropyLoss for classification

None
metrics Optional[List[str]]

the list of metrics you need to track during training. The metrics should be one of the functional metrics implemented in torchmetrics. By default, it is accuracy if classification and mean_squared_error for regression

None
metrics_params Optional[List]

The parameters to be passed to the metrics function. task is forced to be multiclass because the multiclass version can handle binary as well and for simplicity we are only using multiclass.

None
metrics_prob_input Optional[List]

Is a mandatory parameter for classification metrics defined in the config. This defines whether the input to the metric function is the probability or the class. Length should be same as the number of metrics. Defaults to None.

None
target_range Optional[List]

The range in which we should limit the output variable. Currently ignored for multi-target regression. Typically used for Regression problems. If left empty, will not apply any restrictions

None
seed int

The seed for reproducibility. Defaults to 42

42
Source code in src/pytorch_tabular/models/category_embedding/config.py
@dataclass
class CategoryEmbeddingModelConfig(ModelConfig):
    """CategoryEmbeddingModel configuration.

    Args:
        layers (str): DEPRECATED: Hyphen-separated number of layers and units in the classification head. E.g. 32-64-32.
                Defaults to 128-64-32

        activation (str): DEPRECATED: The activation type in the classification head. The default activation in PyTorch
                like ReLU, TanH, LeakyReLU, etc. https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity.
                Defaults to ReLU

        use_batch_norm (bool): DEPRECATED: Flag to include a BatchNorm layer after each Linear Layer+DropOut. Defaults
                to False

        initialization (str): DEPRECATED: Initialization scheme for the linear layers. Defaults to `kaiming`. Choices
                are: [`kaiming`,`xavier`,`random`].

        dropout (float): DEPRECATED: probability of a classification element to be zeroed. This is added to each
                linear layer. Defaults to 0.0


        task (str): Specify whether the problem is regression or classification. `backbone` is a task which
                considers the model as a backbone to generate features. Mostly used internally for SSL and related
                tasks. Choices are: [`regression`,`classification`,`backbone`].

        head (Optional[str]): The head to be used for the model. Should be one of the heads defined in
                `pytorch_tabular.models.common.heads`. Defaults to  LinearHead. Choices are:
                [`None`,`LinearHead`,`MixtureDensityHead`].

        head_config (Optional[Dict]): The config as a dict which defines the head. If left empty, will be
                initialized as default linear head.

        embedding_dims (Optional[List]): The dimensions of the embedding for each categorical column as a
                list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of
                the categorical column using the rule min(50, (x + 1) // 2)

        embedding_dropout (float): Dropout to be applied to the Categorical Embedding. Defaults to 0.0

        batch_norm_continuous_input (bool): If True, we will normalize the continuous layer by passing it
                through a BatchNorm layer.

        learning_rate (float): The learning rate of the model. Defaults to 1e-3.

        loss (Optional[str]): The loss function to be applied. By Default, it is MSELoss for regression and
                CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss
                or L1Loss for regression and CrossEntropyLoss for classification

        metrics (Optional[List[str]]): the list of metrics you need to track during training. The metrics
                should be one of the functional metrics implemented in ``torchmetrics``. By default, it is
                accuracy if classification and mean_squared_error for regression

        metrics_params (Optional[List]): The parameters to be passed to the metrics function. `task` is forced to
                be `multiclass` because the multiclass version can handle binary as well and for simplicity we are
                only using `multiclass`.

        metrics_prob_input (Optional[List]): Is a mandatory parameter for classification metrics defined in the config.
            This defines whether the input to the metric function is the probability or the class. Length should be
            same as the number of metrics. Defaults to None.

        target_range (Optional[List]): The range in which we should limit the output variable. Currently
                ignored for multi-target regression. Typically used for Regression problems. If left empty, will
                not apply any restrictions

        seed (int): The seed for reproducibility. Defaults to 42

    """

    layers: str = field(
        default="128-64-32",
        metadata={
            "help": (
                "Hyphen-separated number of layers and units in the classification"
                " head. eg. 32-64-32. Defaults to 128-64-32"
            )
        },
    )
    activation: str = field(
        default="ReLU",
        metadata={
            "help": (
                "The activation type in the classification head. The default"
                " activation in PyTorch like ReLU, TanH, LeakyReLU, etc."
                " https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity."
                " Defaults to ReLU"
            )
        },
    )
    use_batch_norm: bool = field(
        default=False,
        metadata={"help": ("Flag to include a BatchNorm layer after each Linear Layer+DropOut." " Defaults to False")},
    )
    initialization: str = field(
        default="kaiming",
        metadata={
            "help": ("Initialization scheme for the linear layers. Defaults to `kaiming`"),
            "choices": ["kaiming", "xavier", "random"],
        },
    )
    dropout: float = field(
        default=0.0,
        metadata={
            "help": (
                "probability of an classification element to be zeroed."
                " This is added to each linear layer. Defaults to 0.0"
            )
        },
    )

    # def __post_init__(self):
    #     deprecated_args = [
    #         "layers",
    #         "activation",
    #         "use_batch_norm",
    #         "initialization",
    #         "dropout",
    #     ]
    #     # for arg in deprecated_args:
    #     if any([getattr(self, arg) is not None for arg in deprecated_args]):
    #         warnings.warn(
    #             f"{deprecated_args} are deprecated and will be remoevd in next version. "
    #             "Please use 'head' and `head_config` and set deprecated args "
    #             "to `None` to turn off warning. CategoricalEmbedding model is just a "
    #             "linear head with embedding layers."
    #         )
    #     return super().__post_init__()

    _module_src: str = field(default="models.category_embedding")
    _model_name: str = field(default="CategoryEmbeddingModel")
    _backbone_name: str = field(default="CategoryEmbeddingBackbone")
    _config_name: str = field(default="CategoryEmbeddingModelConfig")

Bases: ModelConfig

DANet configuration.

Parameters:

Name Type Description Default
n_layers int

Number of Blocks in the DANet. 8, 20, 32 are configurations the paper evaluated. Defaults to 8

8
abstlay_dim_1 int

The dimension for the intermediate output in the first ABSTLAY layer in a Block. Defaults to 32

32
abstlay_dim_2 int

The dimension for the intermediate output in the second ABSTLAY layer in a Block. Defaults to 64

None
k int

The number of feature groups in the ABSTLAY layer. Defaults to 5

5
dropout_rate float

Dropout to be applied in the Block. Defaults to 0.1

0.1
task str

Specify whether the problem is regression or classification. backbone is a task which considers the model as a backbone to generate features. Mostly used internally for SSL and related tasks. Choices are: [regression,classification,backbone].

required
head Optional[str]

The head to be used for the model. Should be one of the heads defined in pytorch_tabular.models.common.heads. Defaults to LinearHead. Choices are: [None,LinearHead,MixtureDensityHead].

'LinearHead'
head_config Optional[Dict]

The config as a dict which defines the head. If left empty, will be initialized as default linear head.

lambda: {'layers': ''}()
embedding_dims Optional[List]

The dimensions of the embedding for each categorical column as a list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of the categorical column using the rule min(50, (x + 1) // 2)

None
embedding_dropout float

Dropout to be applied to the Categorical Embedding. Defaults to 0.0

0.0
batch_norm_continuous_input bool

If True, we will normalize the continuous layer by passing it through a BatchNorm layer.

True
learning_rate float

The learning rate of the model. Defaults to 1e-3.

0.001
loss Optional[str]

The loss function to be applied. By Default, it is MSELoss for regression and CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss or L1Loss for regression and CrossEntropyLoss for classification

None
metrics Optional[List[str]]

the list of metrics you need to track during training. The metrics should be one of the functional metrics implemented in torchmetrics. By default, it is accuracy if classification and mean_squared_error for regression

None
metrics_params Optional[List]

The parameters to be passed to the metrics function. task is forced to be multiclass because the multiclass version can handle binary as well and for simplicity we are only using multiclass.

None
metrics_prob_input Optional[List]

Is a mandatory parameter for classification metrics defined in the config. This defines whether the input to the metric function is the probability or the class. Length should be same as the number of metrics. Defaults to None.

None
target_range Optional[List]

The range in which we should limit the output variable. Currently ignored for multi-target regression. Typically used for Regression problems. If left empty, will not apply any restrictions

None
seed int

The seed for reproducibility. Defaults to 42

42
Source code in src/pytorch_tabular/models/danet/config.py
@dataclass
class DANetConfig(ModelConfig):
    """DANet configuration.

    Args:
        n_layers (int): Number of Blocks in the DANet. 8, 20, 32 are configurations
            the paper evaluated. Defaults to 8

        abstlay_dim_1 (int): The dimension for the intermediate output in the
                first ABSTLAY layer in a Block. Defaults to 32

        abstlay_dim_2 (int): The dimension for the intermediate output in the
                second ABSTLAY layer in a Block. Defaults to 64

        k (int): The number of feature groups in the ABSTLAY layer. Defaults to 5

        dropout_rate (float): Dropout to be applied in the Block. Defaults to 0.1

        task (str): Specify whether the problem is regression or classification. `backbone` is a task which
                considers the model as a backbone to generate features. Mostly used internally for SSL and related
                tasks. Choices are: [`regression`,`classification`,`backbone`].

        head (Optional[str]): The head to be used for the model. Should be one of the heads defined in
                `pytorch_tabular.models.common.heads`. Defaults to  LinearHead. Choices are:
                [`None`,`LinearHead`,`MixtureDensityHead`].

        head_config (Optional[Dict]): The config as a dict which defines the head. If left empty, will be
                initialized as default linear head.

        embedding_dims (Optional[List]): The dimensions of the embedding for each categorical column as a
                list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of
                the categorical column using the rule min(50, (x + 1) // 2)

        embedding_dropout (float): Dropout to be applied to the Categorical Embedding. Defaults to 0.0

        batch_norm_continuous_input (bool): If True, we will normalize the continuous layer by passing it
                through a BatchNorm layer.

        learning_rate (float): The learning rate of the model. Defaults to 1e-3.

        loss (Optional[str]): The loss function to be applied. By Default, it is MSELoss for regression and
                CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss
                or L1Loss for regression and CrossEntropyLoss for classification

        metrics (Optional[List[str]]): the list of metrics you need to track during training. The metrics
                should be one of the functional metrics implemented in ``torchmetrics``. By default, it is
                accuracy if classification and mean_squared_error for regression

        metrics_params (Optional[List]): The parameters to be passed to the metrics function. `task` is forced to
                be `multiclass` because the multiclass version can handle binary as well and for simplicity we are
                only using `multiclass`.

        metrics_prob_input (Optional[List]): Is a mandatory parameter for classification metrics defined in the config.
            This defines whether the input to the metric function is the probability or the class. Length should be
            same as the number of metrics. Defaults to None.

        target_range (Optional[List]): The range in which we should limit the output variable. Currently
                ignored for multi-target regression. Typically used for Regression problems. If left empty, will
                not apply any restrictions

        seed (int): The seed for reproducibility. Defaults to 42

    """

    n_layers: int = field(
        default=8,
        metadata={"help": "Number of Blocks in the DANet. Each block has 2 Abstlay Blocks each. Defaults to 8"},
    )

    abstlay_dim_1: int = field(
        default=32,
        metadata={
            "help": "The dimension for the intermediate output in the first ABSTLAY layer in a Block. Defaults to 32"
        },
    )

    abstlay_dim_2: Optional[int] = field(
        default=None,
        metadata={
            "help": "The dimension for the intermediate output in the second ABSTLAY layer in a Block."
            "If None, it will be twice abstlay_dim_1. Defaults to None"
        },
    )
    k: int = field(
        default=5,
        metadata={"help": "The number of feature groups in the ABSTLAY layer. Defaults to 5"},
    )
    dropout_rate: float = field(
        default=0.1,
        metadata={"help": "Dropout to be applied in the Block. Defaults to 0.1"},
    )
    block_activation: str = field(
        default="LeakyReLU",
        metadata={
            "help": "The activation type in the classification head. The default activation in PyTorch"
            " like ReLU, TanH, LeakyReLU, etc."
            " https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity"
        },
    )
    virtual_batch_size: Optional[int] = field(
        default=256,
        metadata={
            "help": "If not None, all BatchNorms will be converted to GhostBatchNorm's "
            " with this virtual batch size. Defaults to None"
        },
    )

    _module_src: str = field(default="models.danet")
    _model_name: str = field(default="DANetModel")
    _backbone_name: str = field(default="DANetBackbone")
    _config_name: str = field(default="DANetConfig")

    def __post_init__(self):
        if self.abstlay_dim_2 is None:
            self.abstlay_dim_2 = self.abstlay_dim_1 * 2
        return super().__post_init__()

Bases: ModelConfig

Tab Transformer configuration.

Parameters:

Name Type Description Default
input_embed_dim int

The embedding dimension for the input categorical features. Defaults to 32

32
embedding_initialization Optional[str]

Initialization scheme for the embedding layers. Defaults to kaiming. Choices are: [kaiming_uniform,kaiming_normal].

'kaiming_uniform'
embedding_bias bool

Flag to turn on Embedding Bias. Defaults to True

True
share_embedding bool

The flag turns on shared embeddings in the input embedding process. The key idea here is to have an embedding for the feature as a whole along with embeddings of each unique values of that column. For more details refer to Appendix A of the TabTransformer paper. Defaults to False

False
share_embedding_strategy Optional[str]

There are two strategies in adding shared embeddings. 1. add - A separate embedding for the feature is added to the embedding of the unique values of the feature. 2. fraction - A fraction of the input embedding is reserved for the shared embedding of the feature. Defaults to fraction. Choices are: [add,fraction].

'fraction'
shared_embedding_fraction float

Fraction of the input_embed_dim to be reserved by the shared embedding. Should be less than one. Defaults to 0.25

0.25
attn_feature_importance bool

If you are facing memory issues, you can turn off feature importance which will not save the attention weights. Defaults to True

True
num_heads int

The number of heads in the Multi-Headed Attention layer. Defaults to 8

8
num_attn_blocks int

The number of layers of stacked Multi-Headed Attention layers. Defaults to 6

6
transformer_head_dim Optional[int]

The number of hidden units in the Multi-Headed Attention layers. Defaults to None and will be same as input_dim.

None
attn_dropout float

Dropout to be applied after Multi headed Attention. Defaults to 0.1

0.1
add_norm_dropout float

Dropout to be applied in the AddNorm Layer. Defaults to 0.1

0.1
ff_dropout float

Dropout to be applied in the Positionwise FeedForward Network. Defaults to 0.1

0.1
ff_hidden_multiplier int

Multiple by which the Positionwise FF layer scales the input. Defaults to 4

4
transformer_activation str

The activation type in the transformer feed forward layers. In addition to the default activation in PyTorch like ReLU, TanH, LeakyReLU, etc. https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity, GEGLU, ReGLU and SwiGLU are also implemented(https://arxiv.org/pdf/2002.05202.pdf). Defaults to GEGLU

'GEGLU'
task str

Specify whether the problem is regression or classification. backbone is a task which considers the model as a backbone to generate features. Mostly used internally for SSL and related tasks.. Choices are: [regression,classification,backbone].

required
head Optional[str]

The head to be used for the model. Should be one of the heads defined in pytorch_tabular.models.common.heads. Defaults to LinearHead. Choices are: [None,LinearHead,MixtureDensityHead].

'LinearHead'
head_config Optional[Dict]

The config as a dict which defines the head. If left empty, will be initialized as default linear head.

lambda: {'layers': ''}()
embedding_dims Optional[List]

The dimensions of the embedding for each categorical column as a list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of the categorical column using the rule min(50, (x + 1) // 2)

None
embedding_dropout float

Dropout to be applied to the Categorical Embedding. Defaults to 0.0

0.0
batch_norm_continuous_input bool

If True, we will normalize the continuous layer by passing it through a BatchNorm layer.

True
learning_rate float

The learning rate of the model. Defaults to 1e-3.

0.001
loss Optional[str]

The loss function to be applied. By Default, it is MSELoss for regression and CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss or L1Loss for regression and CrossEntropyLoss for classification

None
metrics Optional[List[str]]

the list of metrics you need to track during training. The metrics should be one of the functional metrics implemented in torchmetrics. By default, it is accuracy if classification and mean_squared_error for regression

None
metrics_params Optional[List]

The parameters to be passed to the metrics function. task is forced to be multiclass because the multiclass version can handle binary as well and for simplicity we are only using multiclass.

None
metrics_prob_input Optional[List]

Is a mandatory parameter for classification metrics defined in the config. This defines whether the input to the metric function is the probability or the class. Length should be same as the number of metrics. Defaults to None.

None
target_range Optional[List]

The range in which we should limit the output variable. Currently ignored for multi-target regression. Typically used for Regression problems. If left empty, will not apply any restrictions

None
seed int

The seed for reproducibility. Defaults to 42

42
Source code in src/pytorch_tabular/models/ft_transformer/config.py
@dataclass
class FTTransformerConfig(ModelConfig):
    """Tab Transformer configuration.

    Args:
        input_embed_dim (int): The embedding dimension for the input categorical features. Defaults to 32

        embedding_initialization (Optional[str]): Initialization scheme for the embedding layers. Defaults
                to `kaiming`. Choices are: [`kaiming_uniform`,`kaiming_normal`].

        embedding_bias (bool): Flag to turn on Embedding Bias. Defaults to True

        share_embedding (bool): The flag turns on shared embeddings in the input embedding process. The key
                idea here is to have an embedding for the feature as a whole along with embeddings of each unique
                values of that column. For more details refer to Appendix A of the TabTransformer paper. Defaults
                to False

        share_embedding_strategy (Optional[str]): There are two strategies in adding shared embeddings. 1.
                `add` - A separate embedding for the feature is added to the embedding of the unique values of the
                feature. 2. `fraction` - A fraction of the input embedding is reserved for the shared embedding of
                the feature. Defaults to fraction. Choices are: [`add`,`fraction`].

        shared_embedding_fraction (float): Fraction of the input_embed_dim to be reserved by the shared
                embedding. Should be less than one. Defaults to 0.25

        attn_feature_importance (bool): If you are facing memory issues, you can turn off feature
                importance which will not save the attention weights. Defaults to True

        num_heads (int): The number of heads in the Multi-Headed Attention layer. Defaults to 8

        num_attn_blocks (int): The number of layers of stacked Multi-Headed Attention layers. Defaults to 6

        transformer_head_dim (Optional[int]): The number of hidden units in the Multi-Headed Attention
                layers. Defaults to None and will be same as input_dim.

        attn_dropout (float): Dropout to be applied after Multi headed Attention. Defaults to 0.1

        add_norm_dropout (float): Dropout to be applied in the AddNorm Layer. Defaults to 0.1

        ff_dropout (float): Dropout to be applied in the Positionwise FeedForward Network. Defaults to 0.1

        ff_hidden_multiplier (int): Multiple by which the Positionwise FF layer scales the input. Defaults
                to 4

        transformer_activation (str): The activation type in the transformer feed forward layers. In
                addition to the default activation in PyTorch like ReLU, TanH, LeakyReLU, etc.
                https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity, GEGLU,
                ReGLU and SwiGLU are also implemented(https://arxiv.org/pdf/2002.05202.pdf). Defaults to GEGLU

        task (str): Specify whether the problem is regression or classification. `backbone` is a task which
                considers the model as a backbone to generate features. Mostly used internally for SSL and related
                tasks.. Choices are: [`regression`,`classification`,`backbone`].

        head (Optional[str]): The head to be used for the model. Should be one of the heads defined in
                `pytorch_tabular.models.common.heads`. Defaults to  LinearHead. Choices are:
                [`None`,`LinearHead`,`MixtureDensityHead`].

        head_config (Optional[Dict]): The config as a dict which defines the head. If left empty, will be
                initialized as default linear head.

        embedding_dims (Optional[List]): The dimensions of the embedding for each categorical column as a
                list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of
                the categorical column using the rule min(50, (x + 1) // 2)

        embedding_dropout (float): Dropout to be applied to the Categorical Embedding. Defaults to 0.0

        batch_norm_continuous_input (bool): If True, we will normalize the continuous layer by passing it
                through a BatchNorm layer.

        learning_rate (float): The learning rate of the model. Defaults to 1e-3.

        loss (Optional[str]): The loss function to be applied. By Default, it is MSELoss for regression and
                CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss
                or L1Loss for regression and CrossEntropyLoss for classification

        metrics (Optional[List[str]]): the list of metrics you need to track during training. The metrics
                should be one of the functional metrics implemented in ``torchmetrics``. By default, it is
                accuracy if classification and mean_squared_error for regression

        metrics_params (Optional[List]): The parameters to be passed to the metrics function. `task` is forced to
                be `multiclass` because the multiclass version can handle binary as well and for simplicity we are
                only using `multiclass`.

        metrics_prob_input (Optional[List]): Is a mandatory parameter for classification metrics defined in the config.
            This defines whether the input to the metric function is the probability or the class. Length should be
            same as the number of metrics. Defaults to None.

        target_range (Optional[List]): The range in which we should limit the output variable. Currently
                ignored for multi-target regression. Typically used for Regression problems. If left empty, will
                not apply any restrictions

        seed (int): The seed for reproducibility. Defaults to 42

    """

    input_embed_dim: int = field(
        default=32,
        metadata={"help": "The embedding dimension for the input categorical features. Defaults to 32"},
    )
    embedding_initialization: Optional[str] = field(
        default="kaiming_uniform",
        metadata={
            "help": "Initialization scheme for the embedding layers. Defaults to `kaiming`",
            "choices": ["kaiming_uniform", "kaiming_normal"],
        },
    )
    embedding_bias: bool = field(
        default=True,
        metadata={"help": "Flag to turn on Embedding Bias. Defaults to True"},
    )
    share_embedding: bool = field(
        default=False,
        metadata={
            "help": "The flag turns on shared embeddings in the input embedding process."
            " The key idea here is to have an embedding for the feature as a whole along with embeddings"
            " of each unique values of that column. For more details refer"
            " to Appendix A of the TabTransformer paper. Defaults to False"
        },
    )
    share_embedding_strategy: Optional[str] = field(
        default="fraction",
        metadata={
            "help": "There are two strategies in adding shared embeddings."
            " 1. `add` - A separate embedding for the feature is added to the embedding"
            " of the unique values of the feature."
            " 2. `fraction` - A fraction of the input embedding is reserved"
            " for the shared embedding of the feature. Defaults to fraction.",
            "choices": ["add", "fraction"],
        },
    )
    shared_embedding_fraction: float = field(
        default=0.25,
        metadata={
            "help": "Fraction of the input_embed_dim to be reserved by the shared embedding."
            " Should be less than one. Defaults to 0.25"
        },
    )
    attn_feature_importance: bool = field(
        default=True,
        metadata={
            "help": "If you are facing memory issues, you can turn off feature importance"
            " which will not save the attention weights. Defaults to True"
        },
    )
    num_heads: int = field(
        default=8,
        metadata={"help": "The number of heads in the Multi-Headed Attention layer. Defaults to 8"},
    )
    num_attn_blocks: int = field(
        default=6,
        metadata={"help": "The number of layers of stacked Multi-Headed Attention layers. Defaults to 6"},
    )
    transformer_head_dim: Optional[int] = field(
        default=None,
        metadata={
            "help": "The number of hidden units in the Multi-Headed Attention layers."
            " Defaults to None and will be same as input_dim."
        },
    )
    attn_dropout: float = field(
        default=0.1,
        metadata={"help": "Dropout to be applied after Multi headed Attention. Defaults to 0.1"},
    )
    add_norm_dropout: float = field(
        default=0.1,
        metadata={"help": "Dropout to be applied in the AddNorm Layer. Defaults to 0.1"},
    )
    ff_dropout: float = field(
        default=0.1,
        metadata={"help": "Dropout to be applied in the Positionwise FeedForward Network. Defaults to 0.1"},
    )
    ff_hidden_multiplier: int = field(
        default=4,
        metadata={"help": "Multiple by which the Positionwise FF layer scales the input. Defaults to 4"},
    )

    transformer_activation: str = field(
        default="GEGLU",
        metadata={
            "help": "The activation type in the transformer feed forward layers."
            " In addition to the default activation in PyTorch like ReLU, TanH, LeakyReLU, etc."
            " https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity,"
            " GEGLU, ReGLU and SwiGLU are also implemented (https://arxiv.org/pdf/2002.05202.pdf)."
            " Defaults to GEGLU",
        },
    )

    _module_src: str = field(default="models.ft_transformer")
    _model_name: str = field(default="FTTransformerModel")
    _backbone_name: str = field(default="FTTransformerBackbone")
    _config_name: str = field(default="FTTransformerConfig")

Bases: ModelConfig

Gated Adaptive Network for Deep Automated Learning of Features (GANDALF) Config.

Parameters:

Name Type Description Default
gflu_stages int

Number of layers in the feature abstraction layer. Defaults to 6

6
gflu_dropout float

Dropout rate for the feature abstraction layer. Defaults to 0.0

0.0
gflu_feature_init_sparsity float

Only valid for t-softmax. The percentage of features to be selected in each GFLU stage. This is just initialized and during learning it may change. Defaults to 0.3

0.3
learnable_sparsity bool

Only valid for t-softmax. If True, the sparsity parameters will be learned. If False, the sparsity parameters will be fixed to the initial values specified in gflu_feature_init_sparsity and tree_feature_init_sparsity. Defaults to True

True
task str

Specify whether the problem is regression or classification. backbone is a task which considers the model as a backbone to generate features. Mostly used internally for SSL and related tasks. Choices are: [regression,classification,backbone].

required
head Optional[str]

The head to be used for the model. Should be one of the heads defined in pytorch_tabular.models.common.heads. Defaults to LinearHead. Choices are: [None,LinearHead,MixtureDensityHead].

'LinearHead'
head_config Optional[Dict]

The config as a dict which defines the head. If left empty, will be initialized as default linear head.

lambda: {'layers': ''}()
embedding_dims Optional[List]

The dimensions of the embedding for each categorical column as a list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of the categorical column using the rule min(50, (x + 1) // 2)

None
embedding_dropout float

Dropout to be applied to the Categorical Embedding. Defaults to 0.0

0.0
batch_norm_continuous_input bool

If True, we will normalize the continuous layer by passing it through a BatchNorm layer.

True
learning_rate float

The learning rate of the model. Defaults to 1e-3.

0.001
loss Optional[str]

The loss function to be applied. By Default, it is MSELoss for regression and CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss or L1Loss for regression and CrossEntropyLoss for classification

None
metrics Optional[List[str]]

the list of metrics you need to track during training. The metrics should be one of the functional metrics implemented in torchmetrics. By default, it is accuracy if classification and mean_squared_error for regression

None
metrics_params Optional[List]

The parameters to be passed to the metrics function. task is forced to be multiclass because the multiclass version can handle binary as well and for simplicity we are only using multiclass.

None
metrics_prob_input Optional[List]

Is a mandatory parameter for classification metrics defined in the config. This defines whether the input to the metric function is the probability or the class. Length should be same as the number of metrics. Defaults to None.

None
target_range Optional[List]

The range in which we should limit the output variable. Currently ignored for multi-target regression. Typically used for Regression problems. If left empty, will not apply any restrictions

None
seed int

The seed for reproducibility. Defaults to 42

42
Source code in src/pytorch_tabular/models/gandalf/config.py
@dataclass
class GANDALFConfig(ModelConfig):
    """Gated Adaptive Network for Deep Automated Learning of Features (GANDALF) Config.

    Args:
        gflu_stages (int): Number of layers in the feature abstraction layer. Defaults to 6

        gflu_dropout (float): Dropout rate for the feature abstraction layer. Defaults to 0.0

        gflu_feature_init_sparsity (float): Only valid for t-softmax. The percentage of features
                to be selected in each GFLU stage. This is just initialized and during learning
                it may change. Defaults to 0.3

        learnable_sparsity (bool): Only valid for t-softmax. If True, the sparsity parameters
                will be learned. If False, the sparsity parameters will be fixed to the initial
                values specified in `gflu_feature_init_sparsity` and `tree_feature_init_sparsity`.
                Defaults to True

        task (str): Specify whether the problem is regression or classification. `backbone` is a task which
                considers the model as a backbone to generate features. Mostly used internally for SSL and related
                tasks. Choices are: [`regression`,`classification`,`backbone`].

        head (Optional[str]): The head to be used for the model. Should be one of the heads defined in
                `pytorch_tabular.models.common.heads`. Defaults to  LinearHead. Choices are:
                [`None`,`LinearHead`,`MixtureDensityHead`].

        head_config (Optional[Dict]): The config as a dict which defines the head. If left empty, will be
                initialized as default linear head.

        embedding_dims (Optional[List]): The dimensions of the embedding for each categorical column as a
                list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of
                the categorical column using the rule min(50, (x + 1) // 2)

        embedding_dropout (float): Dropout to be applied to the Categorical Embedding. Defaults to 0.0

        batch_norm_continuous_input (bool): If True, we will normalize the continuous layer by passing it
                through a BatchNorm layer.

        learning_rate (float): The learning rate of the model. Defaults to 1e-3.

        loss (Optional[str]): The loss function to be applied. By Default, it is MSELoss for regression and
                CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss
                or L1Loss for regression and CrossEntropyLoss for classification

        metrics (Optional[List[str]]): the list of metrics you need to track during training. The metrics
                should be one of the functional metrics implemented in ``torchmetrics``. By default, it is
                accuracy if classification and mean_squared_error for regression

        metrics_params (Optional[List]): The parameters to be passed to the metrics function. `task` is forced to
                be `multiclass` because the multiclass version can handle binary as well and for simplicity we are
                only using `multiclass`.

        metrics_prob_input (Optional[List]): Is a mandatory parameter for classification metrics defined in the config.
            This defines whether the input to the metric function is the probability or the class. Length should be
            same as the number of metrics. Defaults to None.

        target_range (Optional[List]): The range in which we should limit the output variable. Currently
                ignored for multi-target regression. Typically used for Regression problems. If left empty, will
                not apply any restrictions

        seed (int): The seed for reproducibility. Defaults to 42

    """

    gflu_stages: int = field(
        default=6,
        metadata={"help": "Number of layers in the feature abstraction layer. Defaults to 6"},
    )

    gflu_dropout: float = field(
        default=0.0, metadata={"help": "Dropout rate for the feature abstraction layer. Defaults to 0.0"}
    )

    gflu_feature_init_sparsity: float = field(
        default=0.3,
        metadata={
            "help": "Only valid for t-softmax. The perecentge of features to be selected in "
            "each GFLU stage. This is just initialized and during learning it may change"
        },
    )
    learnable_sparsity: bool = field(
        default=True,
        metadata={
            "help": "Only valid for t-softmax. If True, the sparsity parameters will be learned."
            "If False, the sparsity parameters will be fixed to the initial values specified in "
            "`gflu_feature_init_sparsity` and `tree_feature_init_sparsity`"
        },
    )
    _module_src: str = field(default="models.gandalf")
    _model_name: str = field(default="GANDALFModel")
    _backbone_name: str = field(default="GANDALFBackbone")
    _config_name: str = field(default="GANDALFConfig")

    def __post_init__(self):
        assert self.gflu_stages > 0, "gflu_stages should be greater than 0"
        return super().__post_init__()

Bases: ModelConfig

Gated Additive Tree Ensemble configuration.

Parameters:

Name Type Description Default
gflu_stages int

Number of layers in the feature abstraction layer. Defaults to 6

6
gflu_dropout float

Dropout rate for the feature abstraction layer. Defaults to 0.0

0.0
tree_depth int

Depth of the tree. Defaults to 5

4
num_trees int

Number of trees to use in the ensemble. Defaults to 20

10
binning_activation str

The binning function to use. Defaults to entmoid. Defaults to sparsemoid. Choices are: [entmoid,sparsemoid,sigmoid].

'sparsemoid'
feature_mask_function str

The feature mask function to use. Defaults to sparsemax. Choices are: [entmax,sparsemax,softmax].

't-softmax'
tree_dropout float

probability of dropout in tree binning transformation. Defaults to 0.0

0.0
chain_trees bool

If True, we will chain the trees together. Synonymous to boosting (chaining trees) or bagging (parallel trees). Defaults to True

True
tree_wise_attention bool

If True, we will use tree wise attention to combine trees. Defaults to True

True
tree_wise_attention_dropout float

probability of dropout in the tree wise attention layer. Defaults to 0.0

0.0
share_head_weights bool

If True, we will share the weights between the heads. Defaults to True

True
task str

Specify whether the problem is regression or classification. backbone is a task which considers the model as a backbone to generate features. Mostly used internally for SSL and related tasks. Choices are: [regression,classification,backbone].

required
head Optional[str]

The head to be used for the model. Should be one of the heads defined in pytorch_tabular.models.common.heads. Defaults to LinearHead. Choices are: [None,LinearHead,MixtureDensityHead].

'LinearHead'
head_config Optional[Dict]

The config as a dict which defines the head. If left empty, will be initialized as default linear head.

lambda: {'layers': ''}()
embedding_dims Optional[List]

The dimensions of the embedding for each categorical column as a list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of the categorical column using the rule min(50, (x + 1) // 2)

None
embedding_dropout float

Dropout to be applied to the Categorical Embedding. Defaults to 0.0

0.0
batch_norm_continuous_input bool

If True, we will normalize the continuous layer by passing it through a BatchNorm layer.

True
learning_rate float

The learning rate of the model. Defaults to 1e-3.

0.001
loss Optional[str]

The loss function to be applied. By Default, it is MSELoss for regression and CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss or L1Loss for regression and CrossEntropyLoss for classification

None
metrics Optional[List[str]]

the list of metrics you need to track during training. The metrics should be one of the functional metrics implemented in torchmetrics. By default, it is accuracy if classification and mean_squared_error for regression

None
metrics_params Optional[List]

The parameters to be passed to the metrics function. task is forced to be multiclass because the multiclass version can handle binary as well and for simplicity we are only using multiclass.

None
metrics_prob_input Optional[List]

Is a mandatory parameter for classification metrics defined in the config. This defines whether the input to the metric function is the probability or the class. Length should be same as the number of metrics. Defaults to None.

None
target_range Optional[List]

The range in which we should limit the output variable. Currently ignored for multi-target regression. Typically used for Regression problems. If left empty, will not apply any restrictions

None
seed int

The seed for reproducibility. Defaults to 42

42
Source code in src/pytorch_tabular/models/gate/config.py
@dataclass
class GatedAdditiveTreeEnsembleConfig(ModelConfig):
    """Gated Additive Tree Ensemble configuration.

    Args:
        gflu_stages (int): Number of layers in the feature abstraction layer. Defaults to 6

        gflu_dropout (float): Dropout rate for the feature abstraction layer. Defaults to 0.0

        tree_depth (int): Depth of the tree. Defaults to 5

        num_trees (int): Number of trees to use in the ensemble. Defaults to 20

        binning_activation (str): The binning function to use. Defaults to entmoid. Defaults to sparsemoid.
                Choices are: [`entmoid`,`sparsemoid`,`sigmoid`].

        feature_mask_function (str): The feature mask function to use. Defaults to sparsemax. Choices are:
                [`entmax`,`sparsemax`,`softmax`].

        tree_dropout (float): probability of dropout in tree binning transformation. Defaults to 0.0

        chain_trees (bool): If True, we will chain the trees together. Synonymous to boosting
            (chaining trees) or bagging (parallel trees). Defaults to True

        tree_wise_attention (bool): If True, we will use tree wise attention to combine trees. Defaults to
                True

        tree_wise_attention_dropout (float): probability of dropout in the tree wise attention layer.
                Defaults to 0.0

        share_head_weights (bool): If True, we will share the weights between the heads. Defaults to True


        task (str): Specify whether the problem is regression or classification. `backbone` is a task which
                considers the model as a backbone to generate features. Mostly used internally for SSL and related
                tasks. Choices are: [`regression`,`classification`,`backbone`].

        head (Optional[str]): The head to be used for the model. Should be one of the heads defined in
                `pytorch_tabular.models.common.heads`. Defaults to  LinearHead. Choices are:
                [`None`,`LinearHead`,`MixtureDensityHead`].

        head_config (Optional[Dict]): The config as a dict which defines the head. If left empty, will be
                initialized as default linear head.

        embedding_dims (Optional[List]): The dimensions of the embedding for each categorical column as a
                list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of
                the categorical column using the rule min(50, (x + 1) // 2)

        embedding_dropout (float): Dropout to be applied to the Categorical Embedding. Defaults to 0.0

        batch_norm_continuous_input (bool): If True, we will normalize the continuous layer by passing it
                through a BatchNorm layer.

        learning_rate (float): The learning rate of the model. Defaults to 1e-3.

        loss (Optional[str]): The loss function to be applied. By Default, it is MSELoss for regression and
                CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss
                or L1Loss for regression and CrossEntropyLoss for classification

        metrics (Optional[List[str]]): the list of metrics you need to track during training. The metrics
                should be one of the functional metrics implemented in ``torchmetrics``. By default, it is
                accuracy if classification and mean_squared_error for regression

        metrics_params (Optional[List]): The parameters to be passed to the metrics function. `task` is forced to
                be `multiclass` because the multiclass version can handle binary as well and for simplicity we are
                only using `multiclass`.

        metrics_prob_input (Optional[List]): Is a mandatory parameter for classification metrics defined in the config.
            This defines whether the input to the metric function is the probability or the class. Length should be
            same as the number of metrics. Defaults to None.

        target_range (Optional[List]): The range in which we should limit the output variable. Currently
                ignored for multi-target regression. Typically used for Regression problems. If left empty, will
                not apply any restrictions

        seed (int): The seed for reproducibility. Defaults to 42

    """

    gflu_stages: int = field(
        default=6,
        metadata={"help": "Number of layers in the feature abstraction layer. Defaults to 6"},
    )

    gflu_dropout: float = field(
        default=0.0, metadata={"help": "Dropout rate for the feature abstraction layer. Defaults to 0.0"}
    )

    tree_depth: int = field(default=4, metadata={"help": "Depth of the tree. Defaults to 5"})

    num_trees: int = field(
        default=10,
        metadata={"help": "Number of trees to use in the ensemble. Defaults to 20"},
    )

    binning_activation: str = field(
        default="sparsemoid",
        metadata={
            "help": "The binning function to use. Defaults to entmoid. Defaults to entmoid",
            "choices": ["entmoid", "sparsemoid", "sigmoid"],
        },
    )
    feature_mask_function: str = field(
        default="t-softmax",
        metadata={
            "help": "The feature mask function to use. Defaults to entmax",
            "choices": ["entmax", "sparsemax", "softmax", "t-softmax"],
        },
    )
    gflu_feature_init_sparsity: float = field(
        default=0.3,
        metadata={
            "help": "Only valid for t-softmax. The percentage of features to be dropped in "
            "each GFLU stage. This is just initialized and during learning it may change"
        },
    )
    tree_feature_init_sparsity: float = field(
        default=0.8,
        metadata={
            "help": "Only valid for t-softmax. The perecentge of features to be dropped in "
            "each split in the tree. This is just initialized and during learning it may change"
        },
    )
    learnable_sparsity: bool = field(
        default=True,
        metadata={
            "help": "Only valid for t-softmax. If True, the sparsity parameters will be learned."
            "If False, the sparsity parameters will be fixed to the initial values specified in "
            "`gflu_feature_init_sparsity` and `tree_feature_init_sparsity`"
        },
    )

    tree_dropout: float = field(
        default=0.0,
        metadata={"help": "probability of dropout in tree binning transformation. Defaults to 0.0"},
    )
    chain_trees: bool = field(
        default=True,
        metadata={
            "help": "If True, we will chain the trees together."
            " Synonymous to boosting (chaining trees) or bagging (parallel trees). Defaults to True"
        },
    )
    tree_wise_attention: bool = field(
        default=True,
        metadata={"help": "If True, we will use tree wise attention to combine trees. Defaults to True"},
    )
    tree_wise_attention_dropout: float = field(
        default=0.0,
        metadata={"help": "probability of dropout in the tree wise attention layer. Defaults to 0.0"},
    )
    share_head_weights: bool = field(
        default=True,
        metadata={"help": "If True, we will share the weights between the heads. Defaults to True"},
    )

    _module_src: str = field(default="models.gate")
    _model_name: str = field(default="GatedAdditiveTreeEnsembleModel")
    _backbone_name: str = field(default="GatedAdditiveTreesBackbone")
    _config_name: str = field(default="GatedAdditiveTreeEnsembleConfig")

    def __post_init__(self):
        assert self.tree_depth > 0, "tree_depth should be greater than 0"
        # Either gflu_stages or num_trees should be greater than 0
        assert self.num_trees > 0, (
            "`num_trees` must be greater than 0." "If you want a lighter model which performs better, use GANDALF."
        )
        super().__post_init__()

Bases: ModelConfig

MDN configuration.

Parameters:

Name Type Description Default
backbone_config_class str

The config class for defining the Backbone. The config class should be a valid module path from models. e.g. FTTransformerConfig

None
backbone_config_params Dict

The dict of config parameters for defining the Backbone.

None
task str

Specify whether the problem is regression or classification. backbone is a task which considers the model as a backbone to generate features. Mostly used internally for SSL and related tasks. Choices are: [regression,classification,backbone].

required
head str
'LinearHead'
head_config Dict

The config for defining the Mixed Density Network Head

None
embedding_dims Optional[List]

The dimensions of the embedding for each categorical column as a list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of the categorical column using the rule min(50, (x + 1) // 2)

None
embedding_dropout float

Dropout to be applied to the Categorical Embedding. Defaults to 0.0

0.0
batch_norm_continuous_input bool

If True, we will normalize the continuous layer by passing it through a BatchNorm layer.

True
learning_rate float

The learning rate of the model. Defaults to 1e-3.

0.001
loss Optional[str]

The loss function to be applied. By Default, it is MSELoss for regression and CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss or L1Loss for regression and CrossEntropyLoss for classification

None
metrics Optional[List[str]]

the list of metrics you need to track during training. The metrics should be one of the functional metrics implemented in torchmetrics. By default, it is accuracy if classification and mean_squared_error for regression

None
metrics_params Optional[List]

The parameters to be passed to the metrics function. task is forced to be multiclass because the multiclass version can handle binary as well and for simplicity we are only using multiclass.

None
metrics_prob_input Optional[List]

Is a mandatory parameter for classification metrics defined in the config. This defines whether the input to the metric function is the probability or the class. Length should be same as the number of metrics. Defaults to None.

None
target_range Optional[List]

The range in which we should limit the output variable. Currently ignored for multi-target regression. Typically used for Regression problems. If left empty, will not apply any restrictions

None
seed int

The seed for reproducibility. Defaults to 42

42
Source code in src/pytorch_tabular/models/mixture_density/config.py
@dataclass
class MDNConfig(ModelConfig):
    """MDN configuration.

    Args:
        backbone_config_class (str): The config class for defining the Backbone. The config class should be
                a valid module path from `models`. e.g. `FTTransformerConfig`

        backbone_config_params (Dict): The dict of config parameters for defining the Backbone.


        task (str): Specify whether the problem is regression or classification. `backbone` is a task which
                considers the model as a backbone to generate features. Mostly used internally for SSL and related
                tasks. Choices are: [`regression`,`classification`,`backbone`].

        head (str):

        head_config (Dict): The config for defining the Mixed Density Network Head

        embedding_dims (Optional[List]): The dimensions of the embedding for each categorical column as a
                list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of
                the categorical column using the rule min(50, (x + 1) // 2)

        embedding_dropout (float): Dropout to be applied to the Categorical Embedding. Defaults to 0.0

        batch_norm_continuous_input (bool): If True, we will normalize the continuous layer by passing it
                through a BatchNorm layer.

        learning_rate (float): The learning rate of the model. Defaults to 1e-3.

        loss (Optional[str]): The loss function to be applied. By Default, it is MSELoss for regression and
                CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss
                or L1Loss for regression and CrossEntropyLoss for classification

        metrics (Optional[List[str]]): the list of metrics you need to track during training. The metrics
                should be one of the functional metrics implemented in ``torchmetrics``. By default, it is
                accuracy if classification and mean_squared_error for regression

        metrics_params (Optional[List]): The parameters to be passed to the metrics function. `task` is forced to
                be `multiclass` because the multiclass version can handle binary as well and for simplicity we are
                only using `multiclass`.

        metrics_prob_input (Optional[List]): Is a mandatory parameter for classification metrics defined in the config.
            This defines whether the input to the metric function is the probability or the class. Length should be
            same as the number of metrics. Defaults to None.

        target_range (Optional[List]): The range in which we should limit the output variable. Currently
                ignored for multi-target regression. Typically used for Regression problems. If left empty, will
                not apply any restrictions

        seed (int): The seed for reproducibility. Defaults to 42

    """

    backbone_config_class: str = field(
        default=None,
        metadata={
            "help": "The config class for defining the Backbone."
            " The config class should be a valid module path from `models`. e.g. `FTTransformerConfig`"
        },
    )
    backbone_config_params: Dict = field(
        default=None,
        metadata={"help": "The dict of config parameters for defining the Backbone."},
    )
    head: str = field(init=False, default="MixtureDensityHead")
    head_config: Dict = field(
        default=None,
        metadata={"help": "The config for defining the Mixed Density Network Head"},
    )
    _module_src: str = field(default="models.mixture_density")
    _model_name: str = field(default="MDNModel")
    _config_name: str = field(default="MDNConfig")
    _probabilistic: bool = field(default=True)

    def __post_init__(self):
        assert (
            self.backbone_config_class not in INCOMPATIBLE_BACKBONES
        ), f"{self.backbone_config_class} is not a supported backbone for MDN head"
        assert self.head == "MixtureDensityHead"
        return super().__post_init__()

Bases: ModelConfig

Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data configuration.

Parameters:

Name Type Description Default
num_layers int

Number of Oblivious Decision Tree Layers in the Dense Architecture

1
num_trees int

Number of Oblivious Decision Trees in each layer

2048
additional_tree_output_dim int

The additional output dimensions which is only used to pass through different layers of the architectures. Only the first output_dim outputs will be used for prediction

3
depth int

The depth of the individual Oblivious Decision Trees

6
choice_function str

Generates a sparse probability distribution to be used as feature weights(aka, soft feature selection). Choices are: [entmax15,sparsemax].

'entmax15'
bin_function str

Generates a sparse probability distribution to be used as tree leaf weights. Choices are: [entmoid15,sparsemoid].

'entmoid15'
max_features Optional[int]

If not None, sets a max limit on the number of features to be carried forward from layer to layer in the Dense Architecture

None
input_dropout float

Dropout to be applied to the inputs between layers of the Dense Architecture

0.0
initialize_response str

Initializing the response variable in the Oblivious Decision Trees. By default, it is a standard normal distribution. Choices are: [normal,uniform].

'normal'
initialize_selection_logits str

Initializing the feature selector. By default, is a uniform distribution across the features. Choices are: [uniform,normal].

'uniform'
threshold_init_beta float

Used in the Data-aware initialization of thresholds where the threshold is initialized randomly (with a beta distribution) to feature values in the first batch. It initializes threshold to a q-th quantile of data points. where q ~ Beta(:threshold_init_beta:, :threshold_init_beta:) If this param is set to 1, initial thresholds will have the same distribution as data points If greater than 1 (e.g. 10), thresholds will be closer to median data value If less than 1 (e.g. 0.1), thresholds will approach min/max data values.

1.0
threshold_init_cutoff float

Used in the Data-aware initialization of scales(used in the scaling ODTs). It is initialized in such a way that all the samples in the first batch belong to the linear region of the entmoid/sparsemoid(bin-selectors) and thereby have non-zero gradients Threshold log-temperatures initializer, in (0, inf) By default(1.0), log-temperatures are initialized in such a way that all bin selectors end up in the linear region of sparse-sigmoid. The temperatures are then scaled by this parameter. Setting this value > 1.0 will result in some margin between data points and sparse-sigmoid cutoff value Setting this value < 1.0 will cause (1 - value) part of data points to end up in flat sparse- sigmoid region For instance, threshold_init_cutoff = 0.9 will set 10% points equal to 0.0 or 1.0 Setting this value > 1.0 will result in a margin between data points and sparse-sigmoid cutoff value All points will be between (0.5 - 0.5 / threshold_init_cutoff) and (0.5 + 0.5 / threshold_init_cutoff)

1.0
task str

Specify whether the problem is regression or classification. backbone is a task which considers the model as a backbone to generate features. Mostly used internally for SSL and related tasks. Choices are: [regression,classification,backbone].

required
head Optional[str]

The head to be used for the model. Should be one of the heads defined in pytorch_tabular.models.common.heads. Defaults to LinearHead. Choices are: [None,LinearHead,MixtureDensityHead].

None
head_config Optional[Dict]

The config as a dict which defines the head. If left empty, will be initialized as default linear head.

lambda: {'layers': ''}()
embedding_dims Optional[List]

The dimensions of the embedding for each categorical column as a list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of the categorical column using the rule min(50, (x + 1) // 2)

None
embedding_dropout float

Dropout to be applied to the Categorical Embedding. Defaults to 0.0

0.0
batch_norm_continuous_input bool

If True, we will normalize the continuous layer by passing it through a BatchNorm layer.

True
learning_rate float

The learning rate of the model. Defaults to 1e-3.

0.001
loss Optional[str]

The loss function to be applied. By Default, it is MSELoss for regression and CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss or L1Loss for regression and CrossEntropyLoss for classification

None
metrics Optional[List[str]]

the list of metrics you need to track during training. The metrics should be one of the functional metrics implemented in torchmetrics. By default, it is accuracy if classification and mean_squared_error for regression

None
metrics_params Optional[List]

The parameters to be passed to the metrics function. task is forced to be multiclass because the multiclass version can handle binary as well and for simplicity we are only using multiclass.

None
metrics_prob_input Optional[List]

Is a mandatory parameter for classification metrics defined in the config. This defines whether the input to the metric function is the probability or the class. Length should be same as the number of metrics. Defaults to None.

None
target_range Optional[List]

The range in which we should limit the output variable. Currently ignored for multi-target regression. Typically used for Regression problems. If left empty, will not apply any restrictions

None
seed int

The seed for reproducibility. Defaults to 42

42
Source code in src/pytorch_tabular/models/node/config.py
@dataclass
class NodeConfig(ModelConfig):
    """Neural Oblivious Decision Ensembles for Deep Learning on Tabular Data configuration.

    Args:
        num_layers (int): Number of Oblivious Decision Tree Layers in the Dense Architecture

        num_trees (int): Number of Oblivious Decision Trees in each layer

        additional_tree_output_dim (int): The additional output dimensions which is only used to pass
                through different layers of the architectures. Only the first output_dim outputs will be used for
                prediction

        depth (int): The depth of the individual Oblivious Decision Trees

        choice_function (str): Generates a sparse probability distribution to be used as feature
                weights(aka, soft feature selection). Choices are: [`entmax15`,`sparsemax`].

        bin_function (str): Generates a sparse probability distribution to be used as tree leaf weights.
                Choices are: [`entmoid15`,`sparsemoid`].

        max_features (Optional[int]): If not None, sets a max limit on the number of features to be carried
                forward from layer to layer in the Dense Architecture

        input_dropout (float): Dropout to be applied to the inputs between layers of the Dense Architecture

        initialize_response (str): Initializing the response variable in the Oblivious Decision Trees. By
                default, it is a standard normal distribution. Choices are: [`normal`,`uniform`].

        initialize_selection_logits (str): Initializing the feature selector. By default, is a uniform
                distribution across the features. Choices are: [`uniform`,`normal`].

        threshold_init_beta (float):                  Used in the Data-aware initialization of thresholds
                where the threshold is initialized randomly                 (with a beta distribution) to feature
                values in the first batch.                 It initializes threshold to a q-th quantile of data
                points.                 where q ~ Beta(:threshold_init_beta:, :threshold_init_beta:)
                If this param is set to 1, initial thresholds will have the same distribution as data points
                If greater than 1 (e.g. 10), thresholds will be closer to median data value                 If
                less than 1 (e.g. 0.1), thresholds will approach min/max data values.

        threshold_init_cutoff (float):                  Used in the Data-aware initialization of
                scales(used in the scaling ODTs).                 It is initialized in such a way that all the
                samples in the first batch belong to the linear                 region of the
                entmoid/sparsemoid(bin-selectors) and thereby have non-zero gradients                 Threshold
                log-temperatures initializer, in (0, inf)                 By default(1.0), log-temperatures are
                initialized in such a way that all bin selectors                 end up in the linear region of
                sparse-sigmoid. The temperatures are then scaled by this parameter.                 Setting this
                value > 1.0 will result in some margin between data points and sparse-sigmoid cutoff value
                Setting this value < 1.0 will cause (1 - value) part of data points to end up in flat sparse-
                sigmoid region                 For instance, threshold_init_cutoff = 0.9 will set 10% points equal
                to 0.0 or 1.0                 Setting this value > 1.0 will result in a margin between data points
                and sparse-sigmoid cutoff value                 All points will be between (0.5 - 0.5 /
                threshold_init_cutoff) and (0.5 + 0.5 / threshold_init_cutoff)

        task (str): Specify whether the problem is regression or classification. `backbone` is a task which
                considers the model as a backbone to generate features. Mostly used internally for SSL and related
                tasks. Choices are: [`regression`,`classification`,`backbone`].

        head (Optional[str]): The head to be used for the model. Should be one of the heads defined in
                `pytorch_tabular.models.common.heads`. Defaults to  LinearHead. Choices are:
                [`None`,`LinearHead`,`MixtureDensityHead`].

        head_config (Optional[Dict]): The config as a dict which defines the head. If left empty, will be
                initialized as default linear head.

        embedding_dims (Optional[List]): The dimensions of the embedding for each categorical column as a
                list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of
                the categorical column using the rule min(50, (x + 1) // 2)

        embedding_dropout (float): Dropout to be applied to the Categorical Embedding. Defaults to 0.0

        batch_norm_continuous_input (bool): If True, we will normalize the continuous layer by passing it
                through a BatchNorm layer.

        learning_rate (float): The learning rate of the model. Defaults to 1e-3.

        loss (Optional[str]): The loss function to be applied. By Default, it is MSELoss for regression and
                CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss
                or L1Loss for regression and CrossEntropyLoss for classification

        metrics (Optional[List[str]]): the list of metrics you need to track during training. The metrics
                should be one of the functional metrics implemented in ``torchmetrics``. By default, it is
                accuracy if classification and mean_squared_error for regression

        metrics_params (Optional[List]): The parameters to be passed to the metrics function. `task` is forced to
                be `multiclass` because the multiclass version can handle binary as well and for simplicity we are
                only using `multiclass`.

        metrics_prob_input (Optional[List]): Is a mandatory parameter for classification metrics defined in the config.
            This defines whether the input to the metric function is the probability or the class. Length should be
            same as the number of metrics. Defaults to None.

        target_range (Optional[List]): The range in which we should limit the output variable. Currently
                ignored for multi-target regression. Typically used for Regression problems. If left empty, will
                not apply any restrictions

        seed (int): The seed for reproducibility. Defaults to 42

    """

    num_layers: int = field(
        default=1,
        metadata={"help": "Number of Oblivious Decision Tree Layers in the Dense Architecture"},
    )
    num_trees: int = field(
        default=2048,
        metadata={"help": "Number of Oblivious Decision Trees in each layer"},
    )
    additional_tree_output_dim: int = field(
        default=3,
        metadata={
            "help": "The additional output dimensions which is only used to pass through different layers"
            " of the architectures. Only the first output_dim outputs will be used for prediction"
        },
    )
    depth: int = field(
        default=6,
        metadata={"help": "The depth of the individual Oblivious Decision Trees"},
    )
    choice_function: str = field(
        default="entmax15",
        metadata={
            "help": "Generates a sparse probability distribution to be used"
            " as feature weights(aka, soft feature selection)",
            "choices": ["entmax15", "sparsemax"],
        },
    )
    bin_function: str = field(
        default="entmoid15",
        metadata={
            "help": "Generates a sparse probability distribution to be used as tree leaf weights",
            "choices": ["entmoid15", "sparsemoid"],
        },
    )
    max_features: Optional[int] = field(
        default=None,
        metadata={
            "help": "If not None, sets a max limit on the number of features to be carried forward"
            " from layer to layer in the Dense Architecture"
        },
    )
    input_dropout: float = field(
        default=0.0,
        metadata={"help": "Dropout to be applied to the inputs between layers of the Dense Architecture"},
    )
    initialize_response: str = field(
        default="normal",
        metadata={
            "help": "Initializing the response variable in the Oblivious Decision Trees."
            " By default, it is a standard normal distribution",
            "choices": ["normal", "uniform"],
        },
    )
    initialize_selection_logits: str = field(
        default="uniform",
        metadata={
            "help": "Initializing the feature selector. By default is a uniform distribution across the features",
            "choices": ["uniform", "normal"],
        },
    )
    threshold_init_beta: float = field(
        default=1.0,
        metadata={
            "help": """
                Used in the Data-aware initialization of thresholds where the threshold is initialized randomly
                (with a beta distribution) to feature values in the first batch.
                It initializes threshold to a q-th quantile of data points.
                where q ~ Beta(:threshold_init_beta:, :threshold_init_beta:)
                If this param is set to 1, initial thresholds will have the same distribution as data points
                If greater than 1 (e.g. 10), thresholds will be closer to median data value
                If less than 1 (e.g. 0.1), thresholds will approach min/max data values.
            """
        },
    )
    threshold_init_cutoff: float = field(
        default=1.0,
        metadata={
            "help": """
                Used in the Data-aware initialization of scales(used in the scaling ODTs).
                It is initialized in such a way that all the samples in the first batch belong to the linear
                region of the entmoid/sparsemoid(bin-selectors) and thereby have non-zero gradients
                Threshold log-temperatures initializer, in (0, inf)
                By default(1.0), log-temperatures are initialized in such a way that all bin selectors
                end up in the linear region of sparse-sigmoid. The temperatures are then scaled by this parameter.
                Setting this value > 1.0 will result in some margin between data points and sparse-sigmoid cutoff value
                Setting this value < 1.0 will cause (1 - value) part of data points to end up in flat sparse-sigmoid
                region. For instance, threshold_init_cutoff = 0.9 will set 10% points equal to 0.0 or 1.0
                Setting this value > 1.0 will result in a margin between data points and sparse-sigmoid cutoff value
                All points will be between (0.5 - 0.5 / threshold_init_cutoff) and (0.5 + 0.5 / threshold_init_cutoff)
            """
        },
    )

    head: Optional[str] = field(
        default=None,
    )

    _module_src: str = field(default="models.node")
    _model_name: str = field(default="NODEModel")
    _backbone_name: str = field(default="NODEBackbone")
    _config_name: str = field(default="NodeConfig")

    def __post_init__(self):
        if self.head is not None:
            warnings.warn(
                "`head` and `head_config` is ignored as NODE has a specific"
                " head which subsets the tree outputs. Set `head=None`"
                " to turn off the warning"
            )
        else:
            # Setting Head to LinearHead for compatibility
            self.head = "LinearHead"
        return super().__post_init__()

Bases: ModelConfig

TabNet: Attentive Interpretable Tabular Learning configuration

Parameters:

Name Type Description Default
n_d int

Dimension of the prediction layer (usually between 4 and 64)

8
n_a int

Dimension of the attention layer (usually between 4 and 64)

8
n_steps int

Number of successive steps in the network (usually between 3 and 10)

3
gamma float

Float above 1, scaling factor for attention updates (usually between 1.0 to 2.0)

1.3
n_independent int

Number of independent GLU layer in each GLU block (default 2)

2
n_shared int

Number of independent GLU layer in each GLU block (default 2)

2
virtual_batch_size int

Batch size for Ghost Batch Normalization

128
mask_type str

Either 'sparsemax' or 'entmax' : this is the masking function to use. Choices are: [sparsemax,entmax].

'sparsemax'
task str

Specify whether the problem is regression or classification. backbone is a task which considers the model as a backbone to generate features. Mostly used internally for SSL and related tasks. Choices are: [regression,classification,backbone].

required
head Optional[str]

The head to be used for the model. Should be one of the heads defined in pytorch_tabular.models.common.heads. Defaults to LinearHead. Choices are: [None,LinearHead,MixtureDensityHead].

'LinearHead'
head_config Optional[Dict]

The config as a dict which defines the head. If left empty, will be initialized as default linear head.

lambda: {'layers': ''}()
embedding_dims Optional[List]

The dimensions of the embedding for each categorical column as a list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of the categorical column using the rule min(50, (x + 1) // 2)

None
embedding_dropout float

Dropout to be applied to the Categorical Embedding. Defaults to 0.0

0.0
batch_norm_continuous_input bool

If True, we will normalize the continuous layer by passing it through a BatchNorm layer.

True
learning_rate float

The learning rate of the model. Defaults to 1e-3.

0.001
loss Optional[str]

The loss function to be applied. By Default, it is MSELoss for regression and CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss or L1Loss for regression and CrossEntropyLoss for classification

None
metrics Optional[List[str]]

the list of metrics you need to track during training. The metrics should be one of the functional metrics implemented in torchmetrics. By default, it is accuracy if classification and mean_squared_error for regression

None
metrics_params Optional[List]

The parameters to be passed to the metrics function. task is forced to be multiclass because the multiclass version can handle binary as well and for simplicity we are only using multiclass.

None
metrics_prob_input Optional[List]

Is a mandatory parameter for classification metrics defined in the config. This defines whether the input to the metric function is the probability or the class. Length should be same as the number of metrics. Defaults to None.

None
target_range Optional[List]

The range in which we should limit the output variable. Currently ignored for multi-target regression. Typically used for Regression problems. If left empty, will not apply any restrictions

None
seed int

The seed for reproducibility. Defaults to 42

42
Source code in src/pytorch_tabular/models/tabnet/config.py
@dataclass
class TabNetModelConfig(ModelConfig):
    """TabNet: Attentive Interpretable Tabular Learning configuration

    Args:
        n_d (int): Dimension of the prediction  layer (usually between 4 and 64)

        n_a (int): Dimension of the attention  layer (usually between 4 and 64)

        n_steps (int): Number of successive steps in the network (usually between 3 and 10)

        gamma (float): Float above 1, scaling factor for attention updates (usually between 1.0 to 2.0)

        n_independent (int): Number of independent GLU layer in each GLU block (default 2)

        n_shared (int): Number of independent GLU layer in each GLU block (default 2)

        virtual_batch_size (int): Batch size for Ghost Batch Normalization

        mask_type (str): Either 'sparsemax' or 'entmax' : this is the masking function to use. Choices are:
                [`sparsemax`,`entmax`].

        task (str): Specify whether the problem is regression or classification. `backbone` is a task which
                considers the model as a backbone to generate features. Mostly used internally for SSL and related
                tasks. Choices are: [`regression`,`classification`,`backbone`].

        head (Optional[str]): The head to be used for the model. Should be one of the heads defined in
                `pytorch_tabular.models.common.heads`. Defaults to  LinearHead. Choices are:
                [`None`,`LinearHead`,`MixtureDensityHead`].

        head_config (Optional[Dict]): The config as a dict which defines the head. If left empty, will be
                initialized as default linear head.

        embedding_dims (Optional[List]): The dimensions of the embedding for each categorical column as a
                list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of
                the categorical column using the rule min(50, (x + 1) // 2)

        embedding_dropout (float): Dropout to be applied to the Categorical Embedding. Defaults to 0.0

        batch_norm_continuous_input (bool): If True, we will normalize the continuous layer by passing it
                through a BatchNorm layer.

        learning_rate (float): The learning rate of the model. Defaults to 1e-3.

        loss (Optional[str]): The loss function to be applied. By Default, it is MSELoss for regression and
                CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss
                or L1Loss for regression and CrossEntropyLoss for classification

        metrics (Optional[List[str]]): the list of metrics you need to track during training. The metrics
                should be one of the functional metrics implemented in ``torchmetrics``. By default, it is
                accuracy if classification and mean_squared_error for regression

        metrics_params (Optional[List]): The parameters to be passed to the metrics function. `task` is forced to
                be `multiclass` because the multiclass version can handle binary as well and for simplicity we are
                only using `multiclass`.

        metrics_prob_input (Optional[List]): Is a mandatory parameter for classification metrics defined in the config.
            This defines whether the input to the metric function is the probability or the class. Length should be
            same as the number of metrics. Defaults to None.

        target_range (Optional[List]): The range in which we should limit the output variable. Currently
                ignored for multi-target regression. Typically used for Regression problems. If left empty, will
                not apply any restrictions

        seed (int): The seed for reproducibility. Defaults to 42
    """

    n_d: int = field(
        default=8,
        metadata={"help": "Dimension of the prediction  layer (usually between 4 and 64)"},
    )
    n_a: int = field(
        default=8,
        metadata={"help": "Dimension of the attention  layer (usually between 4 and 64)"},
    )
    n_steps: int = field(
        default=3,
        metadata={"help": ("Number of successive steps in the network (usually between 3 and 10)")},
    )
    gamma: float = field(
        default=1.3,
        metadata={"help": ("Float above 1, scaling factor for attention updates (usually between" " 1.0 to 2.0)")},
    )
    n_independent: int = field(
        default=2,
        metadata={"help": "Number of independent GLU layer in each GLU block (default 2)"},
    )
    n_shared: int = field(
        default=2,
        metadata={"help": "Number of independent GLU layer in each GLU block (default 2)"},
    )
    virtual_batch_size: int = field(
        default=128,
        metadata={"help": "Batch size for Ghost Batch Normalization"},
    )
    mask_type: str = field(
        default="sparsemax",
        metadata={
            "help": ("Either 'sparsemax' or 'entmax' : this is the masking function to use"),
            "choices": ["sparsemax", "entmax"],
        },
    )
    grouped_features: Optional[List[List[str]]] = field(
        default=None,
        metadata={
            "help": (
                "List of list of feature names to be grouped together. This allows the"
                " model to share it's attention accross feature inside a same group."
                " This can be especially useful when your preprocessing generates"
                " correlated or dependant features: like if you use a TF-IDF or a PCA"
                " on a text column. Note that feature importance will be exactly the"
                " same between features on a same group. Please also note that"
                " embeddings generated for a categorical variable are always inside a"
                " same group."
            )
        },
    )
    _module_src: str = field(default="models.tabnet")
    _model_name: str = field(default="TabNetModel")
    _config_name: str = field(default="TabNetModelConfig")

Bases: ModelConfig

Tab Transformer configuration.

Parameters:

Name Type Description Default
input_embed_dim int

The embedding dimension for the input categorical features. Defaults to 32

32
embedding_initialization Optional[str]

Initialization scheme for the embedding layers. Defaults to kaiming. Choices are: [kaiming_uniform,kaiming_normal].

'kaiming_uniform'
embedding_bias bool

Flag to turn on Embedding Bias. Defaults to False

False
share_embedding bool

The flag turns on shared embeddings in the input embedding process. The key idea here is to have an embedding for the feature as a whole along with embeddings of each unique values of that column. For more details refer to Appendix A of the TabTransformer paper. Defaults to False

False
share_embedding_strategy Optional[str]

There are two strategies in adding shared embeddings. 1. add - A separate embedding for the feature is added to the embedding of the unique values of the feature. 2. fraction - A fraction of the input embedding is reserved for the shared embedding of the feature. Defaults to fraction. Choices are: [add,fraction].

'fraction'
shared_embedding_fraction float

Fraction of the input_embed_dim to be reserved by the shared embedding. Should be less than one. Defaults to 0.25

0.25
num_heads int

The number of heads in the Multi-Headed Attention layer. Defaults to 8

8
num_attn_blocks int

The number of layers of stacked Multi-Headed Attention layers. Defaults to 6

6
transformer_head_dim Optional[int]

The number of hidden units in the Multi-Headed Attention layers. Defaults to None and will be same as input_dim.

None
attn_dropout float

Dropout to be applied after Multi headed Attention. Defaults to 0.1

0.1
add_norm_dropout float

Dropout to be applied in the AddNorm Layer. Defaults to 0.1

0.1
ff_dropout float

Dropout to be applied in the Positionwise FeedForward Network. Defaults to 0.1

0.1
ff_hidden_multiplier int

Multiple by which the Positionwise FF layer scales the input. Defaults to 4

4
transformer_activation str

The activation type in the transformer feed forward layers. In addition to the default activation in PyTorch like ReLU, TanH, LeakyReLU, etc. https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity, GEGLU, ReGLU and SwiGLU are also implemented(https://arxiv.org/pdf/2002.05202.pdf). Defaults to GEGLU

'GEGLU'
task str

Specify whether the problem is regression or classification. backbone is a task which considers the model as a backbone to generate features. Mostly used internally for SSL and related tasks.. Choices are: [regression,classification,backbone].

required
head Optional[str]

The head to be used for the model. Should be one of the heads defined in pytorch_tabular.models.common.heads. Defaults to LinearHead. Choices are: [None,LinearHead,MixtureDensityHead].

'LinearHead'
head_config Optional[Dict]

The config as a dict which defines the head. If left empty, will be initialized as default linear head.

lambda: {'layers': ''}()
embedding_dims Optional[List]

The dimensions of the embedding for each categorical column as a list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of the categorical column using the rule min(50, (x + 1) // 2)

None
embedding_dropout float

Dropout to be applied to the Categorical Embedding. Defaults to 0.0

0.0
batch_norm_continuous_input bool

If True, we will normalize the continuous layer by passing it through a BatchNorm layer.

True
learning_rate float

The learning rate of the model. Defaults to 1e-3.

0.001
loss Optional[str]

The loss function to be applied. By Default, it is MSELoss for regression and CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss or L1Loss for regression and CrossEntropyLoss for classification

None
metrics Optional[List[str]]

the list of metrics you need to track during training. The metrics should be one of the functional metrics implemented in torchmetrics. By default, it is accuracy if classification and mean_squared_error for regression

None
metrics_params Optional[List]

The parameters to be passed to the metrics function. task is forced to be multiclass because the multiclass version can handle binary as well and for simplicity we are only using multiclass.

None
metrics_prob_input Optional[List]

Is a mandatory parameter for classification metrics defined in the config. This defines whether the input to the metric function is the probability or the class. Length should be same as the number of metrics. Defaults to None.

None
target_range Optional[List]

The range in which we should limit the output variable. Currently ignored for multi-target regression. Typically used for Regression problems. If left empty, will not apply any restrictions

None
seed int

The seed for reproducibility. Defaults to 42

42
Source code in src/pytorch_tabular/models/tab_transformer/config.py
@dataclass
class TabTransformerConfig(ModelConfig):
    """Tab Transformer configuration.

    Args:
        input_embed_dim (int): The embedding dimension for the input categorical features. Defaults to 32

        embedding_initialization (Optional[str]): Initialization scheme for the embedding layers. Defaults
                to `kaiming`. Choices are: [`kaiming_uniform`,`kaiming_normal`].

        embedding_bias (bool): Flag to turn on Embedding Bias. Defaults to False

        share_embedding (bool): The flag turns on shared embeddings in the input embedding process. The key
                idea here is to have an embedding for the feature as a whole along with embeddings of each unique
                values of that column. For more details refer to Appendix A of the TabTransformer paper. Defaults
                to False

        share_embedding_strategy (Optional[str]): There are two strategies in adding shared embeddings. 1.
                `add` - A separate embedding for the feature is added to the embedding of the unique values of the
                feature. 2. `fraction` - A fraction of the input embedding is reserved for the shared embedding of
                the feature. Defaults to fraction. Choices are: [`add`,`fraction`].

        shared_embedding_fraction (float): Fraction of the input_embed_dim to be reserved by the shared
                embedding. Should be less than one. Defaults to 0.25

        num_heads (int): The number of heads in the Multi-Headed Attention layer. Defaults to 8

        num_attn_blocks (int): The number of layers of stacked Multi-Headed Attention layers. Defaults to 6

        transformer_head_dim (Optional[int]): The number of hidden units in the Multi-Headed Attention
                layers. Defaults to None and will be same as input_dim.

        attn_dropout (float): Dropout to be applied after Multi headed Attention. Defaults to 0.1

        add_norm_dropout (float): Dropout to be applied in the AddNorm Layer. Defaults to 0.1

        ff_dropout (float): Dropout to be applied in the Positionwise FeedForward Network. Defaults to 0.1

        ff_hidden_multiplier (int): Multiple by which the Positionwise FF layer scales the input. Defaults
                to 4

        transformer_activation (str): The activation type in the transformer feed forward layers. In
                addition to the default activation in PyTorch like ReLU, TanH, LeakyReLU, etc.
                https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity, GEGLU,
                ReGLU and SwiGLU are also implemented(https://arxiv.org/pdf/2002.05202.pdf). Defaults to GEGLU

        task (str): Specify whether the problem is regression or classification. `backbone` is a task which
                considers the model as a backbone to generate features. Mostly used internally for SSL and related
                tasks.. Choices are: [`regression`,`classification`,`backbone`].

        head (Optional[str]): The head to be used for the model. Should be one of the heads defined in
                `pytorch_tabular.models.common.heads`. Defaults to  LinearHead. Choices are:
                [`None`,`LinearHead`,`MixtureDensityHead`].

        head_config (Optional[Dict]): The config as a dict which defines the head. If left empty, will be
                initialized as default linear head.

        embedding_dims (Optional[List]): The dimensions of the embedding for each categorical column as a
                list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of
                the categorical column using the rule min(50, (x + 1) // 2)

        embedding_dropout (float): Dropout to be applied to the Categorical Embedding. Defaults to 0.0

        batch_norm_continuous_input (bool): If True, we will normalize the continuous layer by passing it
                through a BatchNorm layer.

        learning_rate (float): The learning rate of the model. Defaults to 1e-3.

        loss (Optional[str]): The loss function to be applied. By Default, it is MSELoss for regression and
                CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss
                or L1Loss for regression and CrossEntropyLoss for classification

        metrics (Optional[List[str]]): the list of metrics you need to track during training. The metrics
                should be one of the functional metrics implemented in ``torchmetrics``. By default, it is
                accuracy if classification and mean_squared_error for regression

        metrics_params (Optional[List]): The parameters to be passed to the metrics function. `task` is forced to
                be `multiclass` because the multiclass version can handle binary as well and for simplicity we are
                only using `multiclass`.

        metrics_prob_input (Optional[List]): Is a mandatory parameter for classification metrics defined in the config.
            This defines whether the input to the metric function is the probability or the class. Length should be
            same as the number of metrics. Defaults to None.

        target_range (Optional[List]): The range in which we should limit the output variable. Currently
                ignored for multi-target regression. Typically used for Regression problems. If left empty, will
                not apply any restrictions

        seed (int): The seed for reproducibility. Defaults to 42

    """

    input_embed_dim: int = field(
        default=32,
        metadata={"help": "The embedding dimension for the input categorical features. Defaults to 32"},
    )
    embedding_initialization: Optional[str] = field(
        default="kaiming_uniform",
        metadata={
            "help": "Initialization scheme for the embedding layers. Defaults to `kaiming`",
            "choices": ["kaiming_uniform", "kaiming_normal"],
        },
    )
    embedding_bias: bool = field(
        default=False,
        metadata={"help": "Flag to turn on Embedding Bias. Defaults to False"},
    )
    share_embedding: bool = field(
        default=False,
        metadata={
            "help": "The flag turns on shared embeddings in the input embedding process."
            " The key idea here is to have an embedding for the feature as a whole along with embeddings"
            " of each unique values of that column. For more details refer"
            " to Appendix A of the TabTransformer paper. Defaults to False"
        },
    )
    share_embedding_strategy: Optional[str] = field(
        default="fraction",
        metadata={
            "help": "There are two strategies in adding shared embeddings."
            " 1. `add` - A separate embedding for the feature is added to the embedding"
            " of the unique values of the feature."
            " 2. `fraction` - A fraction of the input embedding is reserved"
            " for the shared embedding of the feature. Defaults to fraction.",
            "choices": ["add", "fraction"],
        },
    )
    shared_embedding_fraction: float = field(
        default=0.25,
        metadata={
            "help": "Fraction of the input_embed_dim to be reserved by the shared embedding."
            " Should be less than one. Defaults to 0.25"
        },
    )
    num_heads: int = field(
        default=8,
        metadata={"help": "The number of heads in the Multi-Headed Attention layer. Defaults to 8"},
    )
    num_attn_blocks: int = field(
        default=6,
        metadata={"help": "The number of layers of stacked Multi-Headed Attention layers. Defaults to 6"},
    )
    transformer_head_dim: Optional[int] = field(
        default=None,
        metadata={
            "help": "The number of hidden units in the Multi-Headed Attention layers."
            " Defaults to None and will be same as input_dim."
        },
    )
    attn_dropout: float = field(
        default=0.1,
        metadata={"help": "Dropout to be applied after Multi headed Attention. Defaults to 0.1"},
    )
    add_norm_dropout: float = field(
        default=0.1,
        metadata={"help": "Dropout to be applied in the AddNorm Layer. Defaults to 0.1"},
    )
    ff_dropout: float = field(
        default=0.1,
        metadata={"help": "Dropout to be applied in the Positionwise FeedForward Network. Defaults to 0.1"},
    )
    ff_hidden_multiplier: int = field(
        default=4,
        metadata={"help": "Multiple by which the Positionwise FF layer scales the input. Defaults to 4"},
    )
    transformer_activation: str = field(
        default="GEGLU",
        metadata={
            "help": "The activation type in the transformer feed forward layers."
            " In addition to the default activation in PyTorch like ReLU, TanH, LeakyReLU, etc."
            " https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity,"
            " GEGLU, ReGLU and SwiGLU are also implemented(https://arxiv.org/pdf/2002.05202.pdf)."
            " Defaults to GEGLU",
        },
    )
    _module_src: str = field(default="models.tab_transformer")
    _model_name: str = field(default="TabTransformerModel")
    _backbone_name: str = field(default="TabTransformerBackbone")
    _config_name: str = field(default="TabTransformerConfig")

Base Model configuration.

Parameters:

Name Type Description Default
task str

Specify whether the problem is regression or classification. backbone is a task which considers the model as a backbone to generate features. Mostly used internally for SSL and related tasks.. Choices are: [regression,classification,backbone].

required
head Optional[str]

The head to be used for the model. Should be one of the heads defined in pytorch_tabular.models.common.heads. Defaults to LinearHead. Choices are: [None,LinearHead,MixtureDensityHead].

'LinearHead'
head_config Optional[Dict]

The config as a dict which defines the head. If left empty, will be initialized as default linear head.

lambda: {'layers': ''}()
embedding_dims Optional[List]

The dimensions of the embedding for each categorical column as a list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of the categorical column using the rule min(50, (x + 1) // 2)

None
embedding_dropout float

Dropout to be applied to the Categorical Embedding. Defaults to 0.0

0.0
batch_norm_continuous_input bool

If True, we will normalize the continuous layer by passing it through a BatchNorm layer.

True
virtual_batch_size Optional[int]

If not None, all BatchNorms will be converted to GhostBatchNorm's with the specified virtual batch size. Defaults to None

None
learning_rate float

The learning rate of the model. Defaults to 1e-3.

0.001
loss Optional[str]

The loss function to be applied. By Default, it is MSELoss for regression and CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss or L1Loss for regression and CrossEntropyLoss for classification

None
metrics Optional[List[str]]

the list of metrics you need to track during training. The metrics should be one of the functional metrics implemented in torchmetrics. By default, it is accuracy if classification and mean_squared_error for regression

None
metrics_prob_input Optional[bool]

Is a mandatory parameter for classification metrics defined in the config. This defines whether the input to the metric function is the probability or the class. Length should be same as the number of metrics. Defaults to None.

None
metrics_params Optional[List]

The parameters to be passed to the metrics function. task is forced to be multiclass because the multiclass version can handle binary as well and for simplicity we are only using multiclass.

None
target_range Optional[List]

The range in which we should limit the output variable. Currently ignored for multi-target regression. Typically used for Regression problems. If left empty, will not apply any restrictions

None
seed int

The seed for reproducibility. Defaults to 42

42
Source code in src/pytorch_tabular/config/config.py
@dataclass
class ModelConfig:
    """Base Model configuration.

    Args:
        task (str): Specify whether the problem is regression or classification. `backbone` is a task which
                considers the model as a backbone to generate features. Mostly used internally for SSL and related
                tasks.. Choices are: [`regression`,`classification`,`backbone`].

        head (Optional[str]): The head to be used for the model. Should be one of the heads defined in
                `pytorch_tabular.models.common.heads`. Defaults to  LinearHead. Choices are:
                [`None`,`LinearHead`,`MixtureDensityHead`].

        head_config (Optional[Dict]): The config as a dict which defines the head. If left empty, will be
                initialized as default linear head.

        embedding_dims (Optional[List]): The dimensions of the embedding for each categorical column as a
                list of tuples (cardinality, embedding_dim). If left empty, will infer using the cardinality of
                the categorical column using the rule min(50, (x + 1) // 2)

        embedding_dropout (float): Dropout to be applied to the Categorical Embedding. Defaults to 0.0

        batch_norm_continuous_input (bool): If True, we will normalize the continuous layer by passing it
                through a BatchNorm layer.

        virtual_batch_size (Optional[int]): If not None, all BatchNorms will be converted to GhostBatchNorm's
                with the specified virtual batch size. Defaults to None

        learning_rate (float): The learning rate of the model. Defaults to 1e-3.

        loss (Optional[str]): The loss function to be applied. By Default, it is MSELoss for regression and
                CrossEntropyLoss for classification. Unless you are sure what you are doing, leave it at MSELoss
                or L1Loss for regression and CrossEntropyLoss for classification

        metrics (Optional[List[str]]): the list of metrics you need to track during training. The metrics
                should be one of the functional metrics implemented in ``torchmetrics``. By default, it is
                accuracy if classification and mean_squared_error for regression

        metrics_prob_input (Optional[bool]): Is a mandatory parameter for classification metrics defined in
                the config. This defines whether the input to the metric function is the probability or the class.
                Length should be same as the number of metrics. Defaults to None.

        metrics_params (Optional[List]): The parameters to be passed to the metrics function. `task` is forced to
                be `multiclass` because the multiclass version can handle binary as well and for simplicity we are
                only using `multiclass`.

        target_range (Optional[List]): The range in which we should limit the output variable. Currently
                ignored for multi-target regression. Typically used for Regression problems. If left empty, will
                not apply any restrictions

        seed (int): The seed for reproducibility. Defaults to 42

    """

    task: str = field(
        metadata={
            "help": "Specify whether the problem is regression or classification."
            " `backbone` is a task which considers the model as a backbone to generate features."
            " Mostly used internally for SSL and related tasks.",
            "choices": ["regression", "classification", "backbone"],
        }
    )

    head: Optional[str] = field(
        default="LinearHead",
        metadata={
            "help": "The head to be used for the model. Should be one of the heads defined"
            " in `pytorch_tabular.models.common.heads`. Defaults to  LinearHead",
            "choices": [None, "LinearHead", "MixtureDensityHead"],
        },
    )

    head_config: Optional[Dict] = field(
        default_factory=lambda: {"layers": ""},
        metadata={
            "help": "The config as a dict which defines the head."
            " If left empty, will be initialized as default linear head."
        },
    )
    embedding_dims: Optional[List] = field(
        default=None,
        metadata={
            "help": "The dimensions of the embedding for each categorical column as a list of tuples "
            "(cardinality, embedding_dim). If left empty, will infer using the cardinality of the "
            "categorical column using the rule min(50, (x + 1) // 2)"
        },
    )
    embedding_dropout: float = field(
        default=0.0,
        metadata={"help": "Dropout to be applied to the Categorical Embedding. Defaults to 0.0"},
    )
    batch_norm_continuous_input: bool = field(
        default=True,
        metadata={"help": "If True, we will normalize the continuous layer by passing it through a BatchNorm layer."},
    )

    learning_rate: float = field(
        default=1e-3,
        metadata={"help": "The learning rate of the model. Defaults to 1e-3."},
    )
    loss: Optional[str] = field(
        default=None,
        metadata={
            "help": "The loss function to be applied. By Default it is MSELoss for regression "
            "and CrossEntropyLoss for classification. Unless you are sure what you are doing, "
            "leave it at MSELoss or L1Loss for regression and CrossEntropyLoss for classification"
        },
    )
    metrics: Optional[List[str]] = field(
        default=None,
        metadata={
            "help": "the list of metrics you need to track during training. The metrics should be one "
            "of the functional metrics implemented in ``torchmetrics``. To use your own metric, please "
            "use the `metric` param in the `fit` method By default, it is accuracy if classification "
            "and mean_squared_error for regression"
        },
    )
    metrics_prob_input: Optional[List[bool]] = field(
        default=None,
        metadata={
            "help": "Is a mandatory parameter for classification metrics defined in the config. This defines "
            "whether the input to the metric function is the probability or the class. Length should be same "
            "as the number of metrics. Defaults to None."
        },
    )
    metrics_params: Optional[List] = field(
        default=None,
        metadata={
            "help": "The parameters to be passed to the metrics function. `task` is forced to be `multiclass`` "
            "because the multiclass version can handle binary as well and for simplicity we are only using "
            "`multiclass`."
        },
    )
    target_range: Optional[List] = field(
        default=None,
        metadata={
            "help": "The range in which we should limit the output variable. "
            "Currently ignored for multi-target regression. Typically used for Regression problems. "
            "If left empty, will not apply any restrictions"
        },
    )

    virtual_batch_size: Optional[int] = field(
        default=None,
        metadata={
            "help": "If not None, all BatchNorms will be converted to GhostBatchNorm's "
            " with this virtual batch size. Defaults to None"
        },
    )

    seed: int = field(
        default=42,
        metadata={"help": "The seed for reproducibility. Defaults to 42"},
    )

    _module_src: str = field(default="models")
    _model_name: str = field(default="Model")
    _backbone_name: str = field(default="Backbone")
    _config_name: str = field(default="Config")

    def __post_init__(self):
        if self.task == "regression":
            self.loss = self.loss or "MSELoss"
            self.metrics = self.metrics or ["mean_squared_error"]
            self.metrics_params = [{} for _ in self.metrics] if self.metrics_params is None else self.metrics_params
            self.metrics_prob_input = [False for _ in self.metrics]  # not used in Regression. just for compatibility
        elif self.task == "classification":
            self.loss = self.loss or "CrossEntropyLoss"
            self.metrics = self.metrics or ["accuracy"]
            self.metrics_params = [{} for _ in self.metrics] if self.metrics_params is None else self.metrics_params
            self.metrics_prob_input = (
                [False for _ in self.metrics] if self.metrics_prob_input is None else self.metrics_prob_input
            )
        elif self.task == "backbone":
            self.loss = None
            self.metrics = None
            self.metrics_params = None
            if self.head is not None:
                logger.warning("`head` is not a valid parameter for backbone task. Making `head=None`")
                self.head = None
                self.head_config = None
        else:
            raise NotImplementedError(
                f"{self.task} is not a valid task. Should be one of "
                f"{self.__dataclass_fields__['task'].metadata['choices']}"
            )
        if self.metrics is not None:
            assert len(self.metrics) == len(self.metrics_params), "metrics and metric_params should have same length"

        if self.task != "backbone":
            assert self.head in dir(heads.blocks), f"{self.head} is not a valid head"
            if hasattr(self, "_config_name") and self._config_name != "MDNConfig":
                assert self.head != "MixtureDensityHead", "MixtureDensityHead is not supported as a head for regular "
                "models. Use `MDNConfig` instead. Please see Probabilistic Regression with MDN How-to-Guide in "
                "documentation for the right usage."
            _head_callable = getattr(heads.blocks, self.head)
            ideal_head_config = _head_callable._config_template
            invalid_keys = set(self.head_config.keys()) - set(ideal_head_config.__dict__.keys())
            assert len(invalid_keys) == 0, f"`head_config` has some invalid keys: {invalid_keys}"

        # For Custom models, setting these values for compatibility
        if not hasattr(self, "_config_name"):
            self._config_name = type(self).__name__
        if not hasattr(self, "_model_name"):
            self._model_name = re.sub("[Cc]onfig", "Model", self._config_name)
        if not hasattr(self, "_backbone_name"):
            self._backbone_name = re.sub("[Cc]onfig", "Backbone", self._config_name)
        _validate_choices(self)

Model Classes

Bases: BaseModel

Source code in src/pytorch_tabular/models/autoint/autoint.py
class AutoIntModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        super().__init__(config, **kwargs)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        # Backbone
        self._backbone = AutoIntBackbone(self.hparams)
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # Head
        self._head = self._get_head_from_config()

Bases: BaseModel

Source code in src/pytorch_tabular/models/category_embedding/category_embedding_model.py
class CategoryEmbeddingModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        super().__init__(config, **kwargs)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        # Backbone
        self._backbone = CategoryEmbeddingBackbone(self.hparams)
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # Head
        self.head = self._get_head_from_config()

Bases: BaseModel

Source code in src/pytorch_tabular/models/danet/danet.py
class DANetModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        super().__init__(config, **kwargs)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        if self.hparams.virtual_batch_size > self.hparams.batch_size:
            warnings.warn(
                f"virtual_batch_size({self.hparams.virtual_batch_size}) is greater "
                f"than batch_size ({self.hparams.batch_size}). Setting virtual_batch_size "
                f"to {self.hparams.batch_size}. DANet uses Ghost Batch Normalization, "
                f"which works best when virtual_batch_size is small. Consider setting "
                "virtual_batch_size to something like 256 or 512."
            )
            self.hparams.virtual_batch_size = self.hparams.batch_size
        # Backbone
        self._backbone = DANetBackbone(
            cat_embedding_dims=self.hparams.embedding_dims,
            n_continuous_features=self.hparams.continuous_dim,
            n_layers=self.hparams.n_layers,
            abstlay_dim_1=self.hparams.abstlay_dim_1,
            abstlay_dim_2=self.hparams.abstlay_dim_2,
            k=self.hparams.k,
            dropout_rate=self.hparams.dropout_rate,
            block_activation=getattr(nn, self.hparams.block_activation)(),
            virtual_batch_size=self.hparams.virtual_batch_size,
            embedding_dropout=self.hparams.embedding_dropout,
            batch_norm_continuous_input=self.hparams.batch_norm_continuous_input,
        )
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # Head
        self._head = self._get_head_from_config()

Bases: BaseModel

Source code in src/pytorch_tabular/models/ft_transformer/ft_transformer.py
class FTTransformerModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        super().__init__(config, **kwargs)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        # Backbone
        self._backbone = FTTransformerBackbone(self.hparams)
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # Head
        self._head = self._get_head_from_config()

    def feature_importance(self):
        if self.hparams.attn_feature_importance:
            return super().feature_importance()
        else:
            raise ValueError("If you want Feature Importance, `attn_feature_weights` should be `True`.")

Bases: BaseModel

Source code in src/pytorch_tabular/models/gandalf/gandalf.py
class GANDALFModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        super().__init__(config, **kwargs)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        # Backbone
        self._backbone = GANDALFBackbone(
            cat_embedding_dims=self.hparams.embedding_dims,
            n_continuous_features=self.hparams.continuous_dim,
            gflu_stages=self.hparams.gflu_stages,
            gflu_dropout=self.hparams.gflu_dropout,
            gflu_feature_init_sparsity=self.hparams.gflu_feature_init_sparsity,
            learnable_sparsity=self.hparams.learnable_sparsity,
            batch_norm_continuous_input=self.hparams.batch_norm_continuous_input,
            embedding_dropout=self.hparams.embedding_dropout,
            virtual_batch_size=self.hparams.virtual_batch_size,
        )
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # Head
        self.T0 = nn.Parameter(torch.rand(self.hparams.output_dim), requires_grad=True)
        self._head = nn.Sequential(self._get_head_from_config(), Add(self.T0))

    def data_aware_initialization(self, datamodule):
        if self.hparams.task == "regression":
            logger.info("Data Aware Initialization of T0")
            # Need a big batch to initialize properly
            alt_loader = datamodule.train_dataloader(batch_size=self.hparams.data_aware_init_batch_size)
            batch = next(iter(alt_loader))
            self.T0.data = torch.mean(batch["target"], dim=0)

Bases: BaseModel

Source code in src/pytorch_tabular/models/gate/gate_model.py
class GatedAdditiveTreeEnsembleModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        super().__init__(config, **kwargs)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        # Backbone
        self._backbone = GatedAdditiveTreesBackbone(
            n_continuous_features=self.hparams.continuous_dim,
            cat_embedding_dims=self.hparams.embedding_dims,
            gflu_stages=self.hparams.gflu_stages,
            gflu_dropout=self.hparams.gflu_dropout,
            num_trees=self.hparams.num_trees,
            tree_depth=self.hparams.tree_depth,
            tree_dropout=self.hparams.tree_dropout,
            binning_activation=self.hparams.binning_activation,
            feature_mask_function=self.hparams.feature_mask_function,
            batch_norm_continuous_input=self.hparams.batch_norm_continuous_input,
            chain_trees=self.hparams.chain_trees,
            tree_wise_attention=self.hparams.tree_wise_attention,
            tree_wise_attention_dropout=self.hparams.tree_wise_attention_dropout,
            gflu_feature_init_sparsity=self.hparams.gflu_feature_init_sparsity,
            tree_feature_init_sparsity=self.hparams.tree_feature_init_sparsity,
            virtual_batch_size=self.hparams.virtual_batch_size,
        )
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # Head
        if self.hparams.num_trees == 0:
            self.T0 = nn.Parameter(torch.rand(self.hparams.output_dim), requires_grad=True)
            self._head = nn.Sequential(self._get_head_from_config(), Add(self.T0))
        else:
            self._head = CustomHead(self.backbone.output_dim, self.hparams)

    def data_aware_initialization(self, datamodule):
        if self.hparams.task == "regression":
            logger.info("Data Aware Initialization of T0")
            # Need a big batch to initialize properly
            alt_loader = datamodule.train_dataloader(batch_size=self.hparams.data_aware_init_batch_size)
            batch = next(iter(alt_loader))
            t0 = torch.mean(batch["target"], dim=0)
            if self.hparams.num_trees != 0:
                self.head.T0.data = t0
            else:
                self.T0.data = t0

Bases: BaseModel

Source code in src/pytorch_tabular/models/mixture_density/mdn.py
class MDNModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        assert "inferred_config" in kwargs, "inferred_config not found in initialization arguments"
        self.inferred_config = kwargs["inferred_config"]
        assert config.task == "regression", "MDN is only implemented for Regression"
        super().__init__(config, **kwargs)
        assert self.hparams.output_dim == 1, "MDN is not implemented for multi-targets"
        if config.target_range is not None:
            logger.warning("MDN does not use target range. Ignoring it.")
        self._val_output = []

    def _get_head_from_config(self):
        _head_callable = getattr(blocks, self.hparams.head)
        self.hparams.head_config.input_dim = self.backbone.output_dim
        return _head_callable(
            config=_head_callable._config_template(**self.hparams.head_config),
        )  # output_dim auto-calculated from other configs

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        # Backbone
        callable, config = (
            self.hparams.backbone_config_class,
            self.hparams.backbone_config_params,
        )
        try:
            callable = getattr(models, callable)
        except ModuleNotFoundError as e:
            logger.error(
                "`config class` in `backbone_config` is not valid."
                " The config class should be a valid module path from `models`."
                " e.g. `ft_transformer.FTTransformerConfig`."
            )
            raise e
        assert issubclass(callable, ModelConfig), "`config_class` should be a subclass of `ModelConfig`"
        backbone_config = callable(**config)
        backbone_callable = getattr_nested(backbone_config._module_src, backbone_config._backbone_name)
        # Merging the config and inferred config
        backbone_config = safe_merge_config(OmegaConf.structured(backbone_config), self.inferred_config)
        self._backbone = backbone_callable(backbone_config)
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # Head
        self._head = self._get_head_from_config()

    # Redefining forward because TabTransformer flow is slightly different
    def forward(self, x: Dict):
        if isinstance(self.backbone, TabTransformerBackbone):
            if self.hparams.categorical_dim > 0:
                x_cat = self.embed_input({"categorical": x["categorical"]})
            x = self.compute_backbone({"categorical": x_cat, "continuous": x["continuous"]})
        else:
            x = self.embedding_layer(x)
            x = self.compute_backbone(x)
        return self.compute_head(x)

        # Redefining compute_backbone because TabTransformer flow flow is slightly different

    def compute_backbone(self, x: Union[Dict, torch.Tensor]):
        # Returns output
        if isinstance(self.backbone, TabTransformerBackbone):
            x = self.backbone(x["categorical"], x["continuous"])
        else:
            x = self.backbone(x)
        return x

    def compute_head(self, x: Tensor):
        pi, sigma, mu = self.head(x)
        return {"pi": pi, "sigma": sigma, "mu": mu, "backbone_features": x}

    def predict(self, x: Dict):
        ret_value = self.forward(x)
        return self.head.generate_point_predictions(ret_value["pi"], ret_value["sigma"], ret_value["mu"])

    def sample(self, x: Dict, n_samples: Optional[int] = None, ret_model_output=False):
        ret_value = self.forward(x)
        samples = self.head.generate_samples(ret_value["pi"], ret_value["sigma"], ret_value["mu"], n_samples)
        if ret_model_output:
            return samples, ret_value
        else:
            return samples

    def calculate_loss(self, y, pi, sigma, mu, tag="train"):
        # NLL Loss
        log_prob = self.head.log_prob(pi, sigma, mu, y)
        loss = torch.mean(-log_prob)
        if self.head.hparams.weight_regularization is not None:
            sigma_l1_reg = 0
            pi_l1_reg = 0
            mu_l1_reg = 0
            if self.head.hparams.lambda_sigma > 0:
                # Weight Regularization Sigma
                sigma_params = torch.cat([x.view(-1) for x in self.head.sigma.parameters()])
                sigma_l1_reg = self.head.hparams.lambda_sigma * torch.norm(
                    sigma_params, self.head.hparams.weight_regularization
                )
            if self.head.hparams.lambda_pi > 0:
                pi_params = torch.cat([x.view(-1) for x in self.head.pi.parameters()])
                pi_l1_reg = self.head.hparams.lambda_pi * torch.norm(pi_params, self.head.hparams.weight_regularization)
            if self.head.hparams.lambda_mu > 0:
                mu_params = torch.cat([x.view(-1) for x in self.head.mu.parameters()])
                mu_l1_reg = self.head.hparams.lambda_mu * torch.norm(mu_params, self.head.hparams.weight_regularization)

            loss = loss + sigma_l1_reg + pi_l1_reg + mu_l1_reg
        self.log(
            f"{tag}_loss",
            loss,
            on_epoch=(tag == "valid") or (tag == "test"),
            on_step=(tag == "train"),
            # on_step=False,
            logger=True,
            prog_bar=True,
        )
        return loss

    def training_step(self, batch, batch_idx):
        y = batch["target"]
        ret_value = self(batch)
        loss = self.calculate_loss(y, ret_value["pi"], ret_value["sigma"], ret_value["mu"], tag="train")
        if self.head.hparams.speedup_training:
            pass
        else:
            y_hat = self.head.generate_point_predictions(ret_value["pi"], ret_value["sigma"], ret_value["mu"])
            self.calculate_metrics(y, y_hat, tag="train")
        return loss

    def validation_step(self, batch, batch_idx):
        y = batch["target"]
        ret_value = self(batch)
        self.calculate_loss(y, ret_value["pi"], ret_value["sigma"], ret_value["mu"], tag="valid")
        y_hat = self.head.generate_point_predictions(ret_value["pi"], ret_value["sigma"], ret_value["mu"])
        self.calculate_metrics(y, y_hat, tag="valid")
        return y_hat, y, ret_value

    def test_step(self, batch, batch_idx):
        y = batch["target"]
        ret_value = self(batch)
        self.calculate_loss(y, ret_value["pi"], ret_value["sigma"], ret_value["mu"], tag="test")
        y_hat = self.head.generate_point_predictions(ret_value["pi"], ret_value["sigma"], ret_value["mu"])
        self.calculate_metrics(y, y_hat, tag="test")
        return y_hat, y

    def on_validation_batch_end(self, outputs, batch, batch_idx: int) -> None:
        self._val_output.append(outputs)
        super().on_validation_batch_end(outputs, batch, batch_idx)

    def on_validation_epoch_end(self) -> None:
        pi = [
            nn.functional.gumbel_softmax(output[2]["pi"], tau=self.head.hparams.softmax_temperature, dim=-1)
            for output in self._val_output
        ]
        pi = torch.cat(pi).detach().cpu()
        for i in range(self.head.hparams.num_gaussian):
            self.log(
                f"mean_pi_{i}",
                pi[:, i].mean(),
                on_epoch=True,
                on_step=False,
                logger=True,
                prog_bar=False,
            )

        mu = [output[2]["mu"] for output in self._val_output]
        mu = torch.cat(mu).detach().cpu()
        for i in range(self.head.hparams.num_gaussian):
            self.log(
                f"mean_mu_{i}",
                mu[:, i].mean(),
                on_epoch=True,
                on_step=False,
                logger=True,
                prog_bar=False,
            )

        sigma = [output[2]["sigma"] for output in self._val_output]
        sigma = torch.cat(sigma).detach().cpu()
        for i in range(self.head.hparams.num_gaussian):
            self.log(
                f"mean_sigma_{i}",
                sigma[:, i].mean(),
                on_epoch=True,
                on_step=False,
                logger=True,
                prog_bar=False,
            )
        if self.do_log_logits:
            logits = [output[0] for output in self._val_output]
            logits = torch.cat(logits).detach().cpu()
            fig = self.create_plotly_histogram(logits.unsqueeze(1), "logits")
            wandb.log(
                {
                    "valid_logits": fig,
                    "global_step": self.global_step,
                },
                commit=False,
            )
            if self.head.hparams.log_debug_plot:
                fig = self.create_plotly_histogram(pi, "pi", bin_dict={"start": 0.0, "end": 1.0, "size": 0.1})
                wandb.log(
                    {
                        "valid_pi": fig,
                        "global_step": self.global_step,
                    },
                    commit=False,
                )

                fig = self.create_plotly_histogram(mu, "mu")
                wandb.log(
                    {
                        "valid_mu": fig,
                        "global_step": self.global_step,
                    },
                    commit=False,
                )

                fig = self.create_plotly_histogram(sigma, "sigma")
                wandb.log(
                    {
                        "valid_sigma": fig,
                        "global_step": self.global_step,
                    },
                    commit=False,
                )
        self._val_output = []

Bases: BaseModel

Source code in src/pytorch_tabular/models/node/node_model.py
class NODEModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        super().__init__(config, **kwargs)

    def subset(self, x):
        return x[..., : self.hparams.output_dim].mean(dim=-2)

    def data_aware_initialization(self, datamodule):
        """Performs data-aware initialization for NODE."""
        logger.info(
            "Data Aware Initialization of NODE using a forward pass with "
            f"{self.hparams.data_aware_init_batch_size} batch size...."
        )
        # Need a big batch to initialize properly
        alt_loader = datamodule.train_dataloader(batch_size=self.hparams.data_aware_init_batch_size)
        batch = next(iter(alt_loader))
        for k, v in batch.items():
            if isinstance(v, list) and (len(v) == 0):
                # Skipping empty list
                continue
            # batch[k] = v.to("cpu" if self.config.gpu == 0 else "cuda")
            batch[k] = v.to(self.device)

        # single forward pass to initialize the ODST
        with torch.no_grad():
            self(batch)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        self._backbone = NODEBackbone(self.hparams)
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # average first n channels of every tree, where n is the number of output targets for regression
        # and number of classes for classification
        # Not using config head because NODE has a specific head
        warnings.warn("Ignoring head config because NODE has a specific head which subsets the tree outputs")
        self._head = Lambda(self.subset)

data_aware_initialization(datamodule)

Performs data-aware initialization for NODE.

Source code in src/pytorch_tabular/models/node/node_model.py
def data_aware_initialization(self, datamodule):
    """Performs data-aware initialization for NODE."""
    logger.info(
        "Data Aware Initialization of NODE using a forward pass with "
        f"{self.hparams.data_aware_init_batch_size} batch size...."
    )
    # Need a big batch to initialize properly
    alt_loader = datamodule.train_dataloader(batch_size=self.hparams.data_aware_init_batch_size)
    batch = next(iter(alt_loader))
    for k, v in batch.items():
        if isinstance(v, list) and (len(v) == 0):
            # Skipping empty list
            continue
        # batch[k] = v.to("cpu" if self.config.gpu == 0 else "cuda")
        batch[k] = v.to(self.device)

    # single forward pass to initialize the ODST
    with torch.no_grad():
        self(batch)

Bases: BaseModel

Source code in src/pytorch_tabular/models/tabnet/tabnet_model.py
class TabNetModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        assert config.task in [
            "regression",
            "classification",
        ], "TabNet is only implemented for Regression and Classification"
        super().__init__(config, **kwargs)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        # TabNet has its own embedding layer.
        # So we are not using the embedding layer from BaseModel
        self._embedding_layer = nn.Identity()
        self._backbone = TabNetBackbone(self.hparams)
        setattr(self.backbone, "output_dim", self.hparams.output_dim)
        # TabNet has its own head
        self._head = nn.Identity()

    def extract_embedding(self):
        raise ValueError("Extracting Embeddings is not supported by Tabnet. Please use another" " compatible model")

Bases: BaseModel

Source code in src/pytorch_tabular/models/tab_transformer/tab_transformer.py
class TabTransformerModel(BaseModel):
    def __init__(self, config: DictConfig, **kwargs):
        super().__init__(config, **kwargs)

    @property
    def backbone(self):
        return self._backbone

    @property
    def embedding_layer(self):
        return self._embedding_layer

    @property
    def head(self):
        return self._head

    def _build_network(self):
        # Backbone
        self._backbone = TabTransformerBackbone(self.hparams)
        # Embedding Layer
        self._embedding_layer = self._backbone._build_embedding_layer()
        # Head
        self._head = self._get_head_from_config()

    # Redefining forward because this model flow is slightly different
    def forward(self, x: Dict):
        if self.hparams.categorical_dim > 0:
            x_cat = self.embed_input({"categorical": x["categorical"]})
        else:
            x_cat = None
        x = self.compute_backbone({"categorical": x_cat, "continuous": x["continuous"]})
        return self.compute_head(x)

    # Redefining compute_backbone because this model flow is slightly different
    def compute_backbone(self, x: Dict):
        # Returns output
        x = self.backbone(x["categorical"], x["continuous"])
        return x

Base Model Class

Bases: LightningModule

Source code in src/pytorch_tabular/models/base_model.py
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
class BaseModel(pl.LightningModule, metaclass=ABCMeta):
    def __init__(
        self,
        config: DictConfig,
        custom_loss: Optional[torch.nn.Module] = None,
        custom_metrics: Optional[List[Callable]] = None,
        custom_metrics_prob_inputs: Optional[List[bool]] = None,
        custom_optimizer: Optional[torch.optim.Optimizer] = None,
        custom_optimizer_params: Dict = {},
        **kwargs,
    ):
        """Base Model for PyTorch Tabular.

        Args:
            config (DictConfig): The configuration for the model.
            custom_loss (Optional[torch.nn.Module], optional): A custom loss function. Defaults to None.
            custom_metrics (Optional[List[Callable]], optional): A list of custom metrics. Defaults to None.
            custom_metrics_prob_inputs (Optional[List[bool]], optional): A list of boolean values indicating whether the
                metric requires probability inputs. Defaults to None.
            custom_optimizer (Optional[torch.optim.Optimizer], optional):
                A custom optimizer as callable or string to be imported. Defaults to None.
            custom_optimizer_params (Dict, optional): A dictionary of custom optimizer parameters. Defaults to {}.
            kwargs (Dict, optional): Additional keyword arguments.

        """
        super().__init__()
        assert "inferred_config" in kwargs, "inferred_config not found in initialization arguments"
        inferred_config = kwargs["inferred_config"]
        # Merging the config and inferred config
        config = safe_merge_config(config, inferred_config)
        self.custom_loss = custom_loss
        self.custom_metrics = custom_metrics
        self.custom_metrics_prob_inputs = custom_metrics_prob_inputs
        self.custom_optimizer = custom_optimizer
        self.custom_optimizer_params = custom_optimizer_params
        self.kwargs = kwargs
        # Updating config with custom parameters for experiment tracking
        if self.custom_loss is not None:
            config.loss = str(self.custom_loss)
        if self.custom_metrics is not None:
            # Adding metrics to config for hparams logging and tracking
            config.metrics = []
            config.metrics_params = []
            for metric in self.custom_metrics:
                if isinstance(metric, partial):
                    # extracting func names from partial functions
                    config.metrics.append(metric.func.__name__)
                    config.metrics_params.append(metric.keywords)
                else:
                    config.metrics.append(metric.__name__)
                    config.metrics_params.append(vars(metric))
            if config.task == "classification":
                config.metrics_prob_input = self.custom_metrics_prob_inputs
        # Updating default metrics in config
        elif config.task == "classification":
            # Adding metric_params to config for classification task
            for i, mp in enumerate(config.metrics_params):
                # For classification task, output_dim == number of classses
                config.metrics_params[i]["task"] = mp.get("task", "multiclass")
                config.metrics_params[i]["num_classes"] = mp.get("num_classes", inferred_config.output_dim)
                if config.metrics[i] in (
                    "accuracy",
                    "precision",
                    "recall",
                    "precision_recall",
                    "specificity",
                    "f1_score",
                    "fbeta_score",
                ):
                    config.metrics_params[i]["top_k"] = mp.get("top_k", 1)

        if self.custom_optimizer is not None:
            config.optimizer = str(self.custom_optimizer.__class__.__name__)
        if len(self.custom_optimizer_params) > 0:
            config.optimizer_params = self.custom_optimizer_params
        self.save_hyperparameters(config)
        # The concatenated output dim of the embedding layer
        self._build_network()
        self._setup_loss()
        self._setup_metrics()
        self._check_and_verify()
        self.do_log_logits = (
            hasattr(self.hparams, "log_logits") and self.hparams.log_logits and self.hparams.log_target == "wandb"
        )
        if self.do_log_logits:
            self._val_logits = []
        if not WANDB_INSTALLED and self.do_log_logits:
            self.do_log_logits = False
            warnings.warn(
                "Wandb is not installed. Please install wandb to log logits. "
                "You can install wandb using pip install wandb or install PyTorch Tabular"
                " using pip install pytorch-tabular[extra]"
            )
        if not PLOTLY_INSTALLED and self.do_log_logits:
            self.do_log_logits = False
            warnings.warn(
                "Plotly is not installed. Please install plotly to log logits. "
                "You can install plotly using pip install plotly or install PyTorch Tabular"
                " using pip install pytorch-tabular[extra]"
            )

    @abstractmethod
    def _build_network(self):
        pass

    @property
    def backbone(self):
        raise NotImplementedError("backbone property needs to be implemented by inheriting classes")

    @property
    def embedding_layer(self):
        raise NotImplementedError("embedding_layer property needs to be implemented by inheriting classes")

    @property
    def head(self):
        raise NotImplementedError("head property needs to be implemented by inheriting classes")

    def _check_and_verify(self):
        assert hasattr(self, "backbone"), "Model has no attribute called `backbone`"
        assert hasattr(self.backbone, "output_dim"), "Backbone needs to have attribute `output_dim`"
        assert hasattr(self, "head"), "Model has no attribute called `head`"

    def _get_head_from_config(self):
        _head_callable = getattr(blocks, self.hparams.head)
        return _head_callable(
            in_units=self.backbone.output_dim,
            output_dim=self.hparams.output_dim,
            config=_head_callable._config_template(**self.hparams.head_config),
        )  # output_dim auto-calculated from other configs

    def _setup_loss(self):
        if self.custom_loss is None:
            try:
                self.loss = getattr(nn, self.hparams.loss)()
            except AttributeError as e:
                logger.error(f"{self.hparams.loss} is not a valid loss defined in the torch.nn module")
                raise e
        else:
            self.loss = self.custom_loss

    def _setup_metrics(self):
        if self.custom_metrics is None:
            self.metrics = []
            task_module = torchmetrics.functional
            for metric in self.hparams.metrics:
                try:
                    self.metrics.append(getattr(task_module, metric))
                except AttributeError as e:
                    logger.error(
                        f"{metric} is not a valid functional metric defined in the torchmetrics.functional module"
                    )
                    raise e
        else:
            self.metrics = self.custom_metrics

    def calculate_loss(self, output: Dict, y: torch.Tensor, tag: str) -> torch.Tensor:
        """Calculates the loss for the model.

        Args:
            output (Dict): The output dictionary from the model
            y (torch.Tensor): The target tensor
            tag (str): The tag to use for logging

        Returns:
            torch.Tensor: The loss value

        """
        y_hat = output["logits"]
        reg_terms = [k for k, v in output.items() if "regularization" in k]
        reg_loss = 0
        for t in reg_terms:
            # Log only if non-zero
            if output[t] != 0:
                reg_loss += output[t]
                self.log(
                    f"{tag}_{t}_loss",
                    output[t],
                    on_epoch=True,
                    on_step=False,
                    logger=True,
                    prog_bar=False,
                )
        if self.hparams.task == "regression":
            computed_loss = reg_loss
            for i in range(self.hparams.output_dim):
                _loss = self.loss(y_hat[:, i], y[:, i])
                computed_loss += _loss
                if self.hparams.output_dim > 1:
                    self.log(
                        f"{tag}_loss_{i}",
                        _loss,
                        on_epoch=True,
                        on_step=False,
                        logger=True,
                        prog_bar=False,
                    )
        else:
            # TODO loss fails with batch size of 1?
            computed_loss = self.loss(y_hat.squeeze(), y.squeeze()) + reg_loss
        self.log(
            f"{tag}_loss",
            computed_loss,
            on_epoch=(tag in ["valid", "test"]),
            on_step=(tag == "train"),
            # on_step=False,
            logger=True,
            prog_bar=True,
        )
        return computed_loss

    def calculate_metrics(self, y: torch.Tensor, y_hat: torch.Tensor, tag: str) -> List[torch.Tensor]:
        """Calculates the metrics for the model.

        Args:
            y (torch.Tensor): The target tensor

            y_hat (torch.Tensor): The predicted tensor

            tag (str): The tag to use for logging

        Returns:
            List[torch.Tensor]: The list of metric values

        """
        metrics = []
        for metric, metric_str, prob_inp, metric_params in zip(
            self.metrics,
            self.hparams.metrics,
            self.hparams.metrics_prob_input,
            self.hparams.metrics_params,
        ):
            if self.hparams.task == "regression":
                _metrics = []
                for i in range(self.hparams.output_dim):
                    name = metric.func.__name__ if isinstance(metric, partial) else metric.__name__
                    if name == torchmetrics.functional.mean_squared_log_error.__name__:
                        # MSLE should only be used in strictly positive targets. It is undefined otherwise
                        _metric = metric(
                            torch.clamp(y_hat[:, i], min=0),
                            torch.clamp(y[:, i], min=0),
                            **metric_params,
                        )
                    else:
                        _metric = metric(y_hat[:, i], y[:, i], **metric_params)
                    if self.hparams.output_dim > 1:
                        self.log(
                            f"{tag}_{metric_str}_{i}",
                            _metric,
                            on_epoch=True,
                            on_step=False,
                            logger=True,
                            prog_bar=False,
                        )
                    _metrics.append(_metric)
                avg_metric = torch.stack(_metrics, dim=0).sum()
            else:
                y_hat = nn.Softmax(dim=-1)(y_hat.squeeze())
                if prob_inp:
                    avg_metric = metric(y_hat, y.squeeze(), **metric_params)
                else:
                    avg_metric = metric(torch.argmax(y_hat, dim=-1), y.squeeze(), **metric_params)
            metrics.append(avg_metric)
            self.log(
                f"{tag}_{metric_str}",
                avg_metric,
                on_epoch=True,
                on_step=False,
                logger=True,
                prog_bar=True,
            )
        return metrics

    def data_aware_initialization(self, datamodule):
        """Performs data-aware initialization of the model when defined."""
        pass

    def compute_backbone(self, x: Dict) -> torch.Tensor:
        # Returns output
        x = self.backbone(x)
        return x

    def embed_input(self, x: Dict) -> torch.Tensor:
        return self.embedding_layer(x)

    def apply_output_sigmoid_scaling(self, y_hat: torch.Tensor) -> torch.Tensor:
        """Applies sigmoid scaling to the output of the model if the task is regression and the target range is
        defined.

        Args:
            y_hat (torch.Tensor): The output of the model

        Returns:
            torch.Tensor: The output of the model with sigmoid scaling applied

        """
        if (self.hparams.task == "regression") and (self.hparams.target_range is not None):
            for i in range(self.hparams.output_dim):
                y_min, y_max = self.hparams.target_range[i]
                y_hat[:, i] = y_min + nn.Sigmoid()(y_hat[:, i]) * (y_max - y_min)
        return y_hat

    def pack_output(self, y_hat: torch.Tensor, backbone_features: torch.tensor) -> Dict[str, Any]:
        """Packs the output of the model.

        Args:
            y_hat (torch.Tensor): The output of the model

            backbone_features (torch.tensor): The backbone features

        Returns:
            The packed output of the model

        """
        # if self.head is the Identity function it means that we cannot extract backbone features,
        # because the model cannot be divide in backbone and head (i.e. TabNet)
        if type(self.head) is nn.Identity:
            return {"logits": y_hat}
        return {"logits": y_hat, "backbone_features": backbone_features}

    def compute_head(self, backbone_features: Tensor) -> Dict[str, Any]:
        """Computes the head of the model.

        Args:
            backbone_features (Tensor): The backbone features

        Returns:
            The output of the model

        """
        y_hat = self.head(backbone_features)
        y_hat = self.apply_output_sigmoid_scaling(y_hat)
        return self.pack_output(y_hat, backbone_features)

    def forward(self, x: Dict) -> Dict[str, Any]:
        """The forward pass of the model.

        Args:
            x (Dict): The input of the model with 'continuous' and 'categorical' keys

        """
        x = self.embed_input(x)
        x = self.compute_backbone(x)
        return self.compute_head(x)

    def predict(self, x: Dict, ret_model_output: bool = False) -> Union[torch.Tensor, Tuple[torch.Tensor, Dict]]:
        """Predicts the output of the model.

        Args:
            x (Dict): The input of the model with 'continuous' and 'categorical' keys

            ret_model_output (bool): If True, the method returns the output of the model

        Returns:
            The output of the model

        """
        assert self.hparams.task != "ssl", "It's not allowed to use the method predict in case of ssl task"
        ret_value = self.forward(x)
        if ret_model_output:
            return ret_value.get("logits"), ret_value
        return ret_value.get("logits")

    def forward_pass(self, batch):
        return self(batch), None

    def extract_embedding(self):
        """Extracts the embedding of the model.

        This is used in `CategoricalEmbeddingTransformer`

        """
        if self.hparams.categorical_dim > 0:
            if not isinstance(self.embedding_layer, PreEncoded1dLayer):
                return self.embedding_layer.cat_embedding_layers
            else:
                raise ValueError(
                    "Cannot extract embedding for PreEncoded1dLayer. Please use a different embedding layer."
                )
        else:
            raise ValueError(
                "Model has been trained with no categorical feature and therefore can't be used"
                " as a Categorical Encoder"
            )

    def training_step(self, batch, batch_idx):
        output, y = self.forward_pass(batch)
        # y is not None for SSL task.Rest of the tasks target is
        # fetched from the batch
        y = batch["target"] if y is None else y
        y_hat = output["logits"]
        loss = self.calculate_loss(output, y, tag="train")
        self.calculate_metrics(y, y_hat, tag="train")
        return loss

    def validation_step(self, batch, batch_idx):
        with torch.no_grad():
            output, y = self.forward_pass(batch)
            # y is not None for SSL task.Rest of the tasks target is
            # fetched from the batch
            y = batch["target"] if y is None else y
            y_hat = output["logits"]
            self.calculate_loss(output, y, tag="valid")
            self.calculate_metrics(y, y_hat, tag="valid")
        return y_hat, y

    def test_step(self, batch, batch_idx):
        with torch.no_grad():
            output, y = self.forward_pass(batch)
            # y is not None for SSL task.Rest of the tasks target is
            # fetched from the batch
            y = batch["target"] if y is None else y
            y_hat = output["logits"]
            self.calculate_loss(output, y, tag="test")
            self.calculate_metrics(y, y_hat, tag="test")
        return y_hat, y

    def configure_optimizers(self):
        if self.custom_optimizer is None:
            # Loading from the config
            try:
                self._optimizer = _create_optimizer(self.hparams.optimizer)
                opt = self._optimizer(
                    self.parameters(),
                    lr=self.hparams.learning_rate,
                    **self.hparams.optimizer_params,
                )
            except AttributeError as e:
                logger.error(f"{self.hparams.optimizer} is not a valid optimizer defined in the torch.optim module")
                raise e
        else:
            # Loading from custom fit arguments
            self._optimizer = _create_optimizer(self.custom_optimizer)

            opt = self._optimizer(
                self.parameters(),
                lr=self.hparams.learning_rate,
                **self.custom_optimizer_params,
            )
        if self.hparams.lr_scheduler is not None:
            try:
                self._lr_scheduler = getattr(torch.optim.lr_scheduler, self.hparams.lr_scheduler)
            except AttributeError as e:
                logger.error(
                    f"{self.hparams.lr_scheduler} is not a valid learning rate sheduler defined"
                    f" in the torch.optim.lr_scheduler module"
                )
                raise e
            if isinstance(self._lr_scheduler, torch.optim.lr_scheduler._LRScheduler):
                return {
                    "optimizer": opt,
                    "lr_scheduler": self._lr_scheduler(opt, **self.hparams.lr_scheduler_params),
                }
            return {
                "optimizer": opt,
                "lr_scheduler": self._lr_scheduler(opt, **self.hparams.lr_scheduler_params),
                "monitor": self.hparams.lr_scheduler_monitor_metric,
            }
        else:
            return opt

    def create_plotly_histogram(self, arr, name, bin_dict=None):
        fig = go.Figure()
        for i in range(arr.shape[-1]):
            fig.add_trace(
                go.Histogram(
                    x=arr[:, i],
                    histnorm="probability",
                    name=f"{name}_{i}",
                    xbins=bin_dict,  # dict(start=0.0, end=1.0, size=0.1),  # bins used for histogram
                )
            )
        # Overlay both histograms
        fig.update_layout(
            barmode="overlay",
            legend={"orientation": "h", "yanchor": "bottom", "y": 1.02, "xanchor": "right", "x": 1},
        )
        # Reduce opacity to see both histograms
        fig.update_traces(opacity=0.5)
        return fig

    def on_validation_batch_end(self, outputs, batch, batch_idx: int) -> None:
        if self.do_log_logits:
            self._val_logits.append(outputs[0][0])
        super().on_validation_batch_end(outputs, batch, batch_idx)

    def on_validation_epoch_end(self) -> None:
        if self.do_log_logits:
            logits = torch.cat(self._val_logits).detach().cpu()
            self._val_logits = []
            fig = self.create_plotly_histogram(logits, "logits")
            wandb.log(
                {"valid_logits": wandb.Plotly(fig), "global_step": self.global_step},
                commit=False,
            )
        super().on_validation_epoch_end()

    def reset_weights(self):
        reset_all_weights(self.backbone)
        reset_all_weights(self.head)
        reset_all_weights(self.embedding_layer)

    def feature_importance(self) -> DataFrame:
        """Returns a dataframe with feature importance for the model."""
        if hasattr(self.backbone, "feature_importance_"):
            imp = self.backbone.feature_importance_
            n_feat = len(self.hparams.categorical_cols + self.hparams.continuous_cols)
            if self.hparams.categorical_dim > 0:
                if imp.shape[0] != n_feat:
                    # Combining Cat Embedded Dimensions to a single one by averaging
                    wt = []
                    norm = []
                    ft_idx = 0
                    for _, embd_dim in self.hparams.embedding_dims:
                        wt.extend([ft_idx] * embd_dim)
                        norm.append(embd_dim)
                        ft_idx += 1
                    for _ in self.hparams.continuous_cols:
                        wt.extend([ft_idx])
                        norm.append(1)
                        ft_idx += 1
                    imp = np.bincount(wt, weights=imp) / np.array(norm)
                else:
                    # For models like FTTransformer, we dont need to do anything
                    # It takes categorical and continuous as individual 2-D features
                    pass
            importance_df = DataFrame(
                {
                    "Features": self.hparams.categorical_cols + self.hparams.continuous_cols,
                    "importance": imp,
                }
            )
            return importance_df
        else:
            raise ValueError("Feature Importance unavailable for this model.")

__init__(config, custom_loss=None, custom_metrics=None, custom_metrics_prob_inputs=None, custom_optimizer=None, custom_optimizer_params={}, **kwargs)

Base Model for PyTorch Tabular.

Parameters:

Name Type Description Default
config DictConfig

The configuration for the model.

required
custom_loss Optional[Module]

A custom loss function. Defaults to None.

None
custom_metrics Optional[List[Callable]]

A list of custom metrics. Defaults to None.

None
custom_metrics_prob_inputs Optional[List[bool]]

A list of boolean values indicating whether the metric requires probability inputs. Defaults to None.

None
custom_optimizer Optional[Optimizer]

A custom optimizer as callable or string to be imported. Defaults to None.

None
custom_optimizer_params Dict

A dictionary of custom optimizer parameters. Defaults to {}.

{}
kwargs Dict

Additional keyword arguments.

{}
Source code in src/pytorch_tabular/models/base_model.py
def __init__(
    self,
    config: DictConfig,
    custom_loss: Optional[torch.nn.Module] = None,
    custom_metrics: Optional[List[Callable]] = None,
    custom_metrics_prob_inputs: Optional[List[bool]] = None,
    custom_optimizer: Optional[torch.optim.Optimizer] = None,
    custom_optimizer_params: Dict = {},
    **kwargs,
):
    """Base Model for PyTorch Tabular.

    Args:
        config (DictConfig): The configuration for the model.
        custom_loss (Optional[torch.nn.Module], optional): A custom loss function. Defaults to None.
        custom_metrics (Optional[List[Callable]], optional): A list of custom metrics. Defaults to None.
        custom_metrics_prob_inputs (Optional[List[bool]], optional): A list of boolean values indicating whether the
            metric requires probability inputs. Defaults to None.
        custom_optimizer (Optional[torch.optim.Optimizer], optional):
            A custom optimizer as callable or string to be imported. Defaults to None.
        custom_optimizer_params (Dict, optional): A dictionary of custom optimizer parameters. Defaults to {}.
        kwargs (Dict, optional): Additional keyword arguments.

    """
    super().__init__()
    assert "inferred_config" in kwargs, "inferred_config not found in initialization arguments"
    inferred_config = kwargs["inferred_config"]
    # Merging the config and inferred config
    config = safe_merge_config(config, inferred_config)
    self.custom_loss = custom_loss
    self.custom_metrics = custom_metrics
    self.custom_metrics_prob_inputs = custom_metrics_prob_inputs
    self.custom_optimizer = custom_optimizer
    self.custom_optimizer_params = custom_optimizer_params
    self.kwargs = kwargs
    # Updating config with custom parameters for experiment tracking
    if self.custom_loss is not None:
        config.loss = str(self.custom_loss)
    if self.custom_metrics is not None:
        # Adding metrics to config for hparams logging and tracking
        config.metrics = []
        config.metrics_params = []
        for metric in self.custom_metrics:
            if isinstance(metric, partial):
                # extracting func names from partial functions
                config.metrics.append(metric.func.__name__)
                config.metrics_params.append(metric.keywords)
            else:
                config.metrics.append(metric.__name__)
                config.metrics_params.append(vars(metric))
        if config.task == "classification":
            config.metrics_prob_input = self.custom_metrics_prob_inputs
    # Updating default metrics in config
    elif config.task == "classification":
        # Adding metric_params to config for classification task
        for i, mp in enumerate(config.metrics_params):
            # For classification task, output_dim == number of classses
            config.metrics_params[i]["task"] = mp.get("task", "multiclass")
            config.metrics_params[i]["num_classes"] = mp.get("num_classes", inferred_config.output_dim)
            if config.metrics[i] in (
                "accuracy",
                "precision",
                "recall",
                "precision_recall",
                "specificity",
                "f1_score",
                "fbeta_score",
            ):
                config.metrics_params[i]["top_k"] = mp.get("top_k", 1)

    if self.custom_optimizer is not None:
        config.optimizer = str(self.custom_optimizer.__class__.__name__)
    if len(self.custom_optimizer_params) > 0:
        config.optimizer_params = self.custom_optimizer_params
    self.save_hyperparameters(config)
    # The concatenated output dim of the embedding layer
    self._build_network()
    self._setup_loss()
    self._setup_metrics()
    self._check_and_verify()
    self.do_log_logits = (
        hasattr(self.hparams, "log_logits") and self.hparams.log_logits and self.hparams.log_target == "wandb"
    )
    if self.do_log_logits:
        self._val_logits = []
    if not WANDB_INSTALLED and self.do_log_logits:
        self.do_log_logits = False
        warnings.warn(
            "Wandb is not installed. Please install wandb to log logits. "
            "You can install wandb using pip install wandb or install PyTorch Tabular"
            " using pip install pytorch-tabular[extra]"
        )
    if not PLOTLY_INSTALLED and self.do_log_logits:
        self.do_log_logits = False
        warnings.warn(
            "Plotly is not installed. Please install plotly to log logits. "
            "You can install plotly using pip install plotly or install PyTorch Tabular"
            " using pip install pytorch-tabular[extra]"
        )

apply_output_sigmoid_scaling(y_hat)

Applies sigmoid scaling to the output of the model if the task is regression and the target range is defined.

Parameters:

Name Type Description Default
y_hat Tensor

The output of the model

required

Returns:

Type Description
Tensor

torch.Tensor: The output of the model with sigmoid scaling applied

Source code in src/pytorch_tabular/models/base_model.py
def apply_output_sigmoid_scaling(self, y_hat: torch.Tensor) -> torch.Tensor:
    """Applies sigmoid scaling to the output of the model if the task is regression and the target range is
    defined.

    Args:
        y_hat (torch.Tensor): The output of the model

    Returns:
        torch.Tensor: The output of the model with sigmoid scaling applied

    """
    if (self.hparams.task == "regression") and (self.hparams.target_range is not None):
        for i in range(self.hparams.output_dim):
            y_min, y_max = self.hparams.target_range[i]
            y_hat[:, i] = y_min + nn.Sigmoid()(y_hat[:, i]) * (y_max - y_min)
    return y_hat

calculate_loss(output, y, tag)

Calculates the loss for the model.

Parameters:

Name Type Description Default
output Dict

The output dictionary from the model

required
y Tensor

The target tensor

required
tag str

The tag to use for logging

required

Returns:

Type Description
Tensor

torch.Tensor: The loss value

Source code in src/pytorch_tabular/models/base_model.py
def calculate_loss(self, output: Dict, y: torch.Tensor, tag: str) -> torch.Tensor:
    """Calculates the loss for the model.

    Args:
        output (Dict): The output dictionary from the model
        y (torch.Tensor): The target tensor
        tag (str): The tag to use for logging

    Returns:
        torch.Tensor: The loss value

    """
    y_hat = output["logits"]
    reg_terms = [k for k, v in output.items() if "regularization" in k]
    reg_loss = 0
    for t in reg_terms:
        # Log only if non-zero
        if output[t] != 0:
            reg_loss += output[t]
            self.log(
                f"{tag}_{t}_loss",
                output[t],
                on_epoch=True,
                on_step=False,
                logger=True,
                prog_bar=False,
            )
    if self.hparams.task == "regression":
        computed_loss = reg_loss
        for i in range(self.hparams.output_dim):
            _loss = self.loss(y_hat[:, i], y[:, i])
            computed_loss += _loss
            if self.hparams.output_dim > 1:
                self.log(
                    f"{tag}_loss_{i}",
                    _loss,
                    on_epoch=True,
                    on_step=False,
                    logger=True,
                    prog_bar=False,
                )
    else:
        # TODO loss fails with batch size of 1?
        computed_loss = self.loss(y_hat.squeeze(), y.squeeze()) + reg_loss
    self.log(
        f"{tag}_loss",
        computed_loss,
        on_epoch=(tag in ["valid", "test"]),
        on_step=(tag == "train"),
        # on_step=False,
        logger=True,
        prog_bar=True,
    )
    return computed_loss

calculate_metrics(y, y_hat, tag)

Calculates the metrics for the model.

Parameters:

Name Type Description Default
y Tensor

The target tensor

required
y_hat Tensor

The predicted tensor

required
tag str

The tag to use for logging

required

Returns:

Type Description
List[Tensor]

List[torch.Tensor]: The list of metric values

Source code in src/pytorch_tabular/models/base_model.py
def calculate_metrics(self, y: torch.Tensor, y_hat: torch.Tensor, tag: str) -> List[torch.Tensor]:
    """Calculates the metrics for the model.

    Args:
        y (torch.Tensor): The target tensor

        y_hat (torch.Tensor): The predicted tensor

        tag (str): The tag to use for logging

    Returns:
        List[torch.Tensor]: The list of metric values

    """
    metrics = []
    for metric, metric_str, prob_inp, metric_params in zip(
        self.metrics,
        self.hparams.metrics,
        self.hparams.metrics_prob_input,
        self.hparams.metrics_params,
    ):
        if self.hparams.task == "regression":
            _metrics = []
            for i in range(self.hparams.output_dim):
                name = metric.func.__name__ if isinstance(metric, partial) else metric.__name__
                if name == torchmetrics.functional.mean_squared_log_error.__name__:
                    # MSLE should only be used in strictly positive targets. It is undefined otherwise
                    _metric = metric(
                        torch.clamp(y_hat[:, i], min=0),
                        torch.clamp(y[:, i], min=0),
                        **metric_params,
                    )
                else:
                    _metric = metric(y_hat[:, i], y[:, i], **metric_params)
                if self.hparams.output_dim > 1:
                    self.log(
                        f"{tag}_{metric_str}_{i}",
                        _metric,
                        on_epoch=True,
                        on_step=False,
                        logger=True,
                        prog_bar=False,
                    )
                _metrics.append(_metric)
            avg_metric = torch.stack(_metrics, dim=0).sum()
        else:
            y_hat = nn.Softmax(dim=-1)(y_hat.squeeze())
            if prob_inp:
                avg_metric = metric(y_hat, y.squeeze(), **metric_params)
            else:
                avg_metric = metric(torch.argmax(y_hat, dim=-1), y.squeeze(), **metric_params)
        metrics.append(avg_metric)
        self.log(
            f"{tag}_{metric_str}",
            avg_metric,
            on_epoch=True,
            on_step=False,
            logger=True,
            prog_bar=True,
        )
    return metrics

compute_head(backbone_features)

Computes the head of the model.

Parameters:

Name Type Description Default
backbone_features Tensor

The backbone features

required

Returns:

Type Description
Dict[str, Any]

The output of the model

Source code in src/pytorch_tabular/models/base_model.py
def compute_head(self, backbone_features: Tensor) -> Dict[str, Any]:
    """Computes the head of the model.

    Args:
        backbone_features (Tensor): The backbone features

    Returns:
        The output of the model

    """
    y_hat = self.head(backbone_features)
    y_hat = self.apply_output_sigmoid_scaling(y_hat)
    return self.pack_output(y_hat, backbone_features)

data_aware_initialization(datamodule)

Performs data-aware initialization of the model when defined.

Source code in src/pytorch_tabular/models/base_model.py
def data_aware_initialization(self, datamodule):
    """Performs data-aware initialization of the model when defined."""
    pass

extract_embedding()

Extracts the embedding of the model.

This is used in CategoricalEmbeddingTransformer

Source code in src/pytorch_tabular/models/base_model.py
def extract_embedding(self):
    """Extracts the embedding of the model.

    This is used in `CategoricalEmbeddingTransformer`

    """
    if self.hparams.categorical_dim > 0:
        if not isinstance(self.embedding_layer, PreEncoded1dLayer):
            return self.embedding_layer.cat_embedding_layers
        else:
            raise ValueError(
                "Cannot extract embedding for PreEncoded1dLayer. Please use a different embedding layer."
            )
    else:
        raise ValueError(
            "Model has been trained with no categorical feature and therefore can't be used"
            " as a Categorical Encoder"
        )

feature_importance()

Returns a dataframe with feature importance for the model.

Source code in src/pytorch_tabular/models/base_model.py
def feature_importance(self) -> DataFrame:
    """Returns a dataframe with feature importance for the model."""
    if hasattr(self.backbone, "feature_importance_"):
        imp = self.backbone.feature_importance_
        n_feat = len(self.hparams.categorical_cols + self.hparams.continuous_cols)
        if self.hparams.categorical_dim > 0:
            if imp.shape[0] != n_feat:
                # Combining Cat Embedded Dimensions to a single one by averaging
                wt = []
                norm = []
                ft_idx = 0
                for _, embd_dim in self.hparams.embedding_dims:
                    wt.extend([ft_idx] * embd_dim)
                    norm.append(embd_dim)
                    ft_idx += 1
                for _ in self.hparams.continuous_cols:
                    wt.extend([ft_idx])
                    norm.append(1)
                    ft_idx += 1
                imp = np.bincount(wt, weights=imp) / np.array(norm)
            else:
                # For models like FTTransformer, we dont need to do anything
                # It takes categorical and continuous as individual 2-D features
                pass
        importance_df = DataFrame(
            {
                "Features": self.hparams.categorical_cols + self.hparams.continuous_cols,
                "importance": imp,
            }
        )
        return importance_df
    else:
        raise ValueError("Feature Importance unavailable for this model.")

forward(x)

The forward pass of the model.

Parameters:

Name Type Description Default
x Dict

The input of the model with 'continuous' and 'categorical' keys

required
Source code in src/pytorch_tabular/models/base_model.py
def forward(self, x: Dict) -> Dict[str, Any]:
    """The forward pass of the model.

    Args:
        x (Dict): The input of the model with 'continuous' and 'categorical' keys

    """
    x = self.embed_input(x)
    x = self.compute_backbone(x)
    return self.compute_head(x)

pack_output(y_hat, backbone_features)

Packs the output of the model.

Parameters:

Name Type Description Default
y_hat Tensor

The output of the model

required
backbone_features tensor

The backbone features

required

Returns:

Type Description
Dict[str, Any]

The packed output of the model

Source code in src/pytorch_tabular/models/base_model.py
def pack_output(self, y_hat: torch.Tensor, backbone_features: torch.tensor) -> Dict[str, Any]:
    """Packs the output of the model.

    Args:
        y_hat (torch.Tensor): The output of the model

        backbone_features (torch.tensor): The backbone features

    Returns:
        The packed output of the model

    """
    # if self.head is the Identity function it means that we cannot extract backbone features,
    # because the model cannot be divide in backbone and head (i.e. TabNet)
    if type(self.head) is nn.Identity:
        return {"logits": y_hat}
    return {"logits": y_hat, "backbone_features": backbone_features}

predict(x, ret_model_output=False)

Predicts the output of the model.

Parameters:

Name Type Description Default
x Dict

The input of the model with 'continuous' and 'categorical' keys

required
ret_model_output bool

If True, the method returns the output of the model

False

Returns:

Type Description
Union[Tensor, Tuple[Tensor, Dict]]

The output of the model

Source code in src/pytorch_tabular/models/base_model.py
def predict(self, x: Dict, ret_model_output: bool = False) -> Union[torch.Tensor, Tuple[torch.Tensor, Dict]]:
    """Predicts the output of the model.

    Args:
        x (Dict): The input of the model with 'continuous' and 'categorical' keys

        ret_model_output (bool): If True, the method returns the output of the model

    Returns:
        The output of the model

    """
    assert self.hparams.task != "ssl", "It's not allowed to use the method predict in case of ssl task"
    ret_value = self.forward(x)
    if ret_model_output:
        return ret_value.get("logits"), ret_value
    return ret_value.get("logits")