Common Modules
Embeddings
pytorch_tabular.models.common.layers.Embedding1dLayer(continuous_dim, categorical_embedding_dims, embedding_dropout=0.0, batch_norm_continuous_input=False)
Bases: nn.Module
Enables different values in a categorical features to have different embeddings.
Source code in src/pytorch_tabular/models/common/layers/embeddings.py
pytorch_tabular.models.common.layers.Embedding2dLayer(continuous_dim, categorical_cardinality, embedding_dim, shared_embedding_strategy=None, frac_shared_embed=0.25, embedding_bias=False, batch_norm_continuous_input=False, embedding_dropout=0.0, initialization=None)
Bases: nn.Module
Embeds categorical and continuous features into a 2D tensor.
PARAMETER | DESCRIPTION |
---|---|
continuous_dim |
number of continuous features
TYPE:
|
categorical_cardinality |
list of cardinalities of categorical features
TYPE:
|
embedding_dim |
embedding dimension
TYPE:
|
shared_embedding_strategy |
strategy to use for shared embeddings
TYPE:
|
frac_shared_embed |
fraction of embeddings to share
TYPE:
|
embedding_bias |
whether to use bias in embedding layers
TYPE:
|
batch_norm_continuous_input |
whether to use batch norm on continuous features
TYPE:
|
embedding_dropout |
dropout to apply to embeddings
TYPE:
|
initialization |
initialization strategy to use for embedding layers
TYPE:
|
Source code in src/pytorch_tabular/models/common/layers/embeddings.py
173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 |
|
pytorch_tabular.models.common.layers.PreEncoded1dLayer(continuous_dim, categorical_dim, embedding_dropout=0.0, batch_norm_continuous_input=False)
Bases: nn.Module
Takes in pre-encoded categorical variables and just concatenates with continuous variables No learnable component.
Source code in src/pytorch_tabular/models/common/layers/embeddings.py
pytorch_tabular.models.common.layers.SharedEmbeddings(num_embed, embed_dim, add_shared_embed=False, frac_shared_embed=0.25)
Bases: nn.Module
Enables different values in a categorical feature to share some embeddings across.
Source code in src/pytorch_tabular/models/common/layers/embeddings.py
Gated Units
pytorch_tabular.models.common.layers.GatedFeatureLearningUnit(n_features_in, n_stages, feature_mask_function=entmax15, feature_sparsity=0.3, learnable_sparsity=True, dropout=0.0)
Bases: nn.Module
Source code in src/pytorch_tabular/models/common/layers/gated_units.py
pytorch_tabular.models.common.layers.GEGLU(d_model, d_ff, dropout=0.1)
Bases: nn.Module
Gated Exponential Linear Unit (GEGLU)
PARAMETER | DESCRIPTION |
---|---|
d_model |
dimension of the model
TYPE:
|
d_ff |
dimension of the feedforward layer
TYPE:
|
dropout |
dropout probability
TYPE:
|
Source code in src/pytorch_tabular/models/common/layers/gated_units.py
pytorch_tabular.models.common.layers.ReGLU(d_model, d_ff, dropout=0.1)
Bases: nn.Module
ReGLU.
PARAMETER | DESCRIPTION |
---|---|
d_model |
dimension of the model
TYPE:
|
d_ff |
dimension of the feedforward layer
TYPE:
|
dropout |
dropout probability
TYPE:
|
Source code in src/pytorch_tabular/models/common/layers/gated_units.py
pytorch_tabular.models.common.layers.SwiGLU(d_model, d_ff, dropout=0.1)
Bases: nn.Module
PARAMETER | DESCRIPTION |
---|---|
d_model |
dimension of the model
TYPE:
|
d_ff |
dimension of the feedforward layer
TYPE:
|
dropout |
dropout probability
TYPE:
|
Source code in src/pytorch_tabular/models/common/layers/gated_units.py
pytorch_tabular.models.common.layers.PositionWiseFeedForward(d_model, d_ff, dropout=0.1, activation=nn.ReLU(), is_gated=False, bias1=True, bias2=True, bias_gate=True)
Bases: nn.Module
title: Position-wise Feed-Forward Network (FFN) summary: Documented reusable implementation of the position wise feedforward network.
Position-wise Feed-Forward Network (FFN)
This is a PyTorch implementation of position-wise feedforward network used in transformer. FFN consists of two fully connected layers. Number of dimensions in the hidden layer $d_{ff}$, is generally set to around four times that of the token embedding $d_{model}$. So it is sometime also called the expand-and-contract network. There is an activation at the hidden layer, which is usually set to ReLU (Rectified Linear Unit) activation, $$\max(0, x)$$ That is, the FFN function is, $$FFN(x, W_1, W_2, b_1, b_2) = \max(0, x W_1 + b_1) W_2 + b_2$$ where $W_1$, $W_2$, $b_1$ and $b_2$ are learnable parameters. Sometimes the GELU (Gaussian Error Linear Unit) activation is also used instead of ReLU. $$x \Phi(x)$$ where $\Phi(x) = P(X \le x), X \sim \mathcal{N}(0,1)$
Gated Linear Units
This is a generic implementation that supports different variants including Gated Linear Units (GLU).
d_model
is the number of features in a token embeddingd_ff
is the number of features in the hidden layer of the FFNdropout
is dropout probability for the hidden layeris_gated
specifies whether the hidden layer is gatedbias1
specified whether the first fully connected layer should have a learnable biasbias2
specified whether the second fully connected layer should have a learnable biasbias_gate
specified whether the fully connected layer for the gate should have a learnable bias
Source code in src/pytorch_tabular/models/common/layers/gated_units.py
Soft Trees
pytorch_tabular.models.common.layers.NeuralDecisionTree(depth, n_features, dropout=0, binning_activation=entmax15, feature_mask_function=entmax15, feature_sparsity=0.8, learnable_sparsity=True)
Bases: nn.Module
Source code in src/pytorch_tabular/models/common/layers/soft_trees.py
pytorch_tabular.models.common.layers.ODST(in_features, num_trees, depth=6, tree_output_dim=1, flatten_output=True, choice_function=sparsemax, bin_function=sparsemoid, initialize_response_=nn.init.normal_, initialize_selection_logits_=nn.init.uniform_, threshold_init_beta=1.0, threshold_init_cutoff=1.0)
Bases: ModuleWithInit
Oblivious Differentiable Sparsemax Trees. http://tinyurl.com/odst-readmore One can drop (sic!) this module anywhere instead of nn.Linear.
:param in_features: number of features in the input tensor :param num_trees: number of trees in this layer :param tree_dim: number of response channels in the response of individual tree :param depth: number of splits in every tree :param flatten_output: if False, returns [..., num_trees, tree_dim], by default returns [..., num_trees * tree_dim] :param choice_function: f(tensor, dim) -> R_simplex computes feature weights s.t. f(tensor, dim).sum(dim) == 1 :param bin_function: f(tensor) -> R[0, 1], computes tree leaf weights
:param initialize_response_: in-place initializer for tree output tensor :param initialize_selection_logits_: in-place initializer for logits that select features for the tree both thresholds and scales are initialized with data-aware init (or .load_state_dict) :param threshold_init_beta: initializes threshold to a q-th quantile of data points where q ~ Beta(:threshold_init_beta:, :threshold_init_beta:) If this param is set to 1, initial thresholds will have the same distribution as data points If greater than 1 (e.g. 10), thresholds will be closer to median data value If less than 1 (e.g. 0.1), thresholds will approach min/max data values.
:param threshold_init_cutoff: threshold log-temperatures initializer, in (0, inf) By default(1.0), log-remperatures are initialized in such a way that all bin selectors end up in the linear region of sparse-sigmoid. The temperatures are then scaled by this parameter. Setting this value > 1.0 will result in some margin between data points and sparse-sigmoid cutoff value Setting this value < 1.0 will cause (1 - value) part of data points to end up in flat sparse-sigmoid region For instance, threshold_init_cutoff = 0.9 will set 10% points equal to 0.0 or 1.0 Setting this value > 1.0 will result in a margin between data points and sparse-sigmoid cutoff value All points will be between (0.5 - 0.5 / threshold_init_cutoff) and (0.5 + 0.5 / threshold_init_cutoff)
Source code in src/pytorch_tabular/models/common/layers/soft_trees.py
Transformers
pytorch_tabular.models.common.layers.AddNorm(input_dim, dropout)
Bases: nn.Module
Applies LayerNorm, Dropout and adds to input.
Standard AddNorm operations in Transformers
Source code in src/pytorch_tabular/models/common/layers/transformers.py
pytorch_tabular.models.common.layers.AppendCLSToken(d_token, initialization)
Bases: nn.Module
Appends the [CLS] token for BERT-like inference.
Initialize self.
Source code in src/pytorch_tabular/models/common/layers/transformers.py
forward(x)
pytorch_tabular.models.common.layers.MultiHeadedAttention(input_dim, num_heads=8, head_dim=16, dropout=0.1, keep_attn=True)
Bases: nn.Module
Multi Headed Attention Block in Transformers.
Source code in src/pytorch_tabular/models/common/layers/transformers.py
pytorch_tabular.models.common.layers.TransformerEncoderBlock(input_embed_dim, num_heads=8, ff_hidden_multiplier=4, ff_activation='GEGLU', attn_dropout=0.1, keep_attn=True, ff_dropout=0.1, add_norm_dropout=0.1, transformer_head_dim=None)
Bases: nn.Module
A single Transformer Encoder Block.
PARAMETER | DESCRIPTION |
---|---|
input_embed_dim |
The input embedding dimension
TYPE:
|
num_heads |
The number of attention heads
TYPE:
|
ff_hidden_multiplier |
The hidden dimension multiplier for the position-wise feed-forward layer
TYPE:
|
ff_activation |
The activation function for the position-wise feed-forward layer
TYPE:
|
attn_dropout |
The dropout probability for the attention layer
TYPE:
|
keep_attn |
Whether to keep the attention weights
TYPE:
|
ff_dropout |
The dropout probability for the position-wise feed-forward layer
TYPE:
|
add_norm_dropout |
The dropout probability for the residual connections
TYPE:
|
transformer_head_dim |
The dimension of the attention heads. If None, will default to input_embed_dim
TYPE:
|
Source code in src/pytorch_tabular/models/common/layers/transformers.py
Miscellaneous
pytorch_tabular.models.common.layers.Lambda(func)
Bases: nn.Module
A wrapper for a lambda function as a pytorch module.
Initialize lambda module
PARAMETER | DESCRIPTION |
---|---|
func |
any function/callable
TYPE:
|
Source code in src/pytorch_tabular/models/common/layers/misc.py
pytorch_tabular.models.common.layers.ModuleWithInit()
Bases: nn.Module
Base class for pytorch module with data-aware initializer on first batch.
Source code in src/pytorch_tabular/models/common/layers/misc.py
pytorch_tabular.models.common.layers.Residual(fn)
Activations
pytorch_tabular.models.common.layers.activations.Entmoid15
Bases: Function
A highly optimized equivalent of labda x: Entmax15([x, 0])