pytorch / 2 / nn.functional.html

torch.nn.functional

Convolution functions

conv1d

Applies a 1D convolution over an input signal composed of several input planes.

conv2d

Applies a 2D convolution over an input image composed of several input planes.

conv3d

Applies a 3D convolution over an input image composed of several input planes.

conv_transpose1d

Applies a 1D transposed convolution operator over an input signal composed of several input planes, sometimes also called "deconvolution".

conv_transpose2d

Applies a 2D transposed convolution operator over an input image composed of several input planes, sometimes also called "deconvolution".

conv_transpose3d

Applies a 3D transposed convolution operator over an input image composed of several input planes, sometimes also called "deconvolution"

unfold

Extracts sliding local blocks from a batched input tensor.

fold

Combines an array of sliding local blocks into a large containing tensor.

Pooling functions

avg_pool1d

Applies a 1D average pooling over an input signal composed of several input planes.

avg_pool2d

Applies 2D average-pooling operation in k H × k W kH \times kW regions by step size s H × s W sH \times sW steps.

avg_pool3d

Applies 3D average-pooling operation in k T × k H × k W kT \times kH \times kW regions by step size s T × s H × s W sT \times sH \times sW steps.

max_pool1d

Applies a 1D max pooling over an input signal composed of several input planes.

max_pool2d

Applies a 2D max pooling over an input signal composed of several input planes.

max_pool3d

Applies a 3D max pooling over an input signal composed of several input planes.

max_unpool1d

Computes a partial inverse of MaxPool1d.

max_unpool2d

Computes a partial inverse of MaxPool2d.

max_unpool3d

Computes a partial inverse of MaxPool3d.

lp_pool1d

Applies a 1D power-average pooling over an input signal composed of several input planes.

lp_pool2d

Applies a 2D power-average pooling over an input signal composed of several input planes.

adaptive_max_pool1d

Applies a 1D adaptive max pooling over an input signal composed of several input planes.

adaptive_max_pool2d

Applies a 2D adaptive max pooling over an input signal composed of several input planes.

adaptive_max_pool3d

Applies a 3D adaptive max pooling over an input signal composed of several input planes.

adaptive_avg_pool1d

Applies a 1D adaptive average pooling over an input signal composed of several input planes.

adaptive_avg_pool2d

Applies a 2D adaptive average pooling over an input signal composed of several input planes.

adaptive_avg_pool3d

Applies a 3D adaptive average pooling over an input signal composed of several input planes.

fractional_max_pool2d

Applies 2D fractional max pooling over an input signal composed of several input planes.

fractional_max_pool3d

Applies 3D fractional max pooling over an input signal composed of several input planes.

Attention Mechanisms

scaled_dot_product_attention

Computes scaled dot product attention on query, key and value tensors, using an optional attention mask if passed, and applying dropout if a probability greater than 0.0 is specified.

Non-linear activation functions

threshold

Thresholds each element of the input Tensor.

threshold_

In-place version of threshold().

relu

Applies the rectified linear unit function element-wise.

relu_

In-place version of relu().

hardtanh

Applies the HardTanh function element-wise.

hardtanh_

In-place version of hardtanh().

hardswish

Applies the hardswish function, element-wise, as described in the paper:

relu6

Applies the element-wise function ReLU6 ( x ) = min ( max ( 0 , x ) , 6 ) \text{ReLU6}(x) = \min(\max(0,x), 6) .

elu

Applies the Exponential Linear Unit (ELU) function element-wise.

elu_

In-place version of elu().

selu

Applies element-wise, SELU ( x ) = s c a l e ( max ( 0 , x ) + min ( 0 , α ( exp ( x ) 1 ) ) ) \text{SELU}(x) = scale * (\max(0,x) + \min(0, \alpha * (\exp(x) - 1))) , with α = 1.6732632423543772848170429916717 \alpha=1.6732632423543772848170429916717 and s c a l e = 1.0507009873554804934193349852946 scale=1.0507009873554804934193349852946 .

celu

Applies element-wise, CELU ( x ) = max ( 0 , x ) + min ( 0 , α ( exp ( x / α ) 1 ) ) \text{CELU}(x) = \max(0,x) + \min(0, \alpha * (\exp(x/\alpha) - 1)) .

leaky_relu

Applies element-wise, LeakyReLU ( x ) = max ( 0 , x ) + negative_slope min ( 0 , x ) \text{LeakyReLU}(x) = \max(0, x) + \text{negative\_slope} * \min(0, x)

leaky_relu_

In-place version of leaky_relu().

prelu

Applies element-wise the function PReLU ( x ) = max ( 0 , x ) + weight min ( 0 , x ) \text{PReLU}(x) = \max(0,x) + \text{weight} * \min(0,x) where weight is a learnable parameter.

rrelu

Randomized leaky ReLU.

rrelu_

In-place version of rrelu().

glu

The gated linear unit.

gelu

When the approximate argument is 'none', it applies element-wise the function GELU ( x ) = x Φ ( x ) \text{GELU}(x) = x * \Phi(x)

logsigmoid

Applies element-wise LogSigmoid ( x i ) = log ( 1 1 + exp ( x i ) ) \text{LogSigmoid}(x_i) = \log \left(\frac{1}{1 + \exp(-x_i)}\right)

hardshrink

Applies the hard shrinkage function element-wise

tanhshrink

Applies element-wise, Tanhshrink ( x ) = x Tanh ( x ) \text{Tanhshrink}(x) = x - \text{Tanh}(x)

softsign

Applies element-wise, the function SoftSign ( x ) = x 1 + x \text{SoftSign}(x) = \frac{x}{1 + |x|}

softplus

Applies element-wise, the function Softplus ( x ) = 1 β log ( 1 + exp ( β x ) ) \text{Softplus}(x) = \frac{1}{\beta} * \log(1 + \exp(\beta * x)) .

softmin

Applies a softmin function.

softmax

Applies a softmax function.

softshrink

Applies the soft shrinkage function elementwise

gumbel_softmax

Samples from the Gumbel-Softmax distribution (Link 1 Link 2) and optionally discretizes.

log_softmax

Applies a softmax followed by a logarithm.

tanh

Applies element-wise, Tanh ( x ) = tanh ( x ) = exp ( x ) exp ( x ) exp ( x ) + exp ( x ) \text{Tanh}(x) = \tanh(x) = \frac{\exp(x) - \exp(-x)}{\exp(x) + \exp(-x)}

sigmoid

Applies the element-wise function Sigmoid ( x ) = 1 1 + exp ( x ) \text{Sigmoid}(x) = \frac{1}{1 + \exp(-x)}

hardsigmoid

Applies the element-wise function

silu

Applies the Sigmoid Linear Unit (SiLU) function, element-wise.

mish

Applies the Mish function, element-wise.

batch_norm

Applies Batch Normalization for each channel across a batch of data.

group_norm

Applies Group Normalization for last certain number of dimensions.

instance_norm

Applies Instance Normalization for each channel in each data sample in a batch.

layer_norm

Applies Layer Normalization for last certain number of dimensions.

local_response_norm

Applies local response normalization over an input signal composed of several input planes, where channels occupy the second dimension.

normalize

Performs L p L_p normalization of inputs over specified dimension.

Linear functions

linear

Applies a linear transformation to the incoming data: y = x A T + b y = xA^T + b .

bilinear

Applies a bilinear transformation to the incoming data: y = x 1 T A x 2 + b y = x_1^T A x_2 + b

Dropout functions

dropout

During training, randomly zeroes some of the elements of the input tensor with probability p using samples from a Bernoulli distribution.

alpha_dropout

Applies alpha dropout to the input.

feature_alpha_dropout

Randomly masks out entire channels (a channel is a feature map, e.g.

dropout1d

Randomly zero out entire channels (a channel is a 1D feature map, e.g., the j j -th channel of the i i -th sample in the batched input is a 1D tensor input [ i , j ] \text{input}[i, j] ) of the input tensor).

dropout2d

Randomly zero out entire channels (a channel is a 2D feature map, e.g., the j j -th channel of the i i -th sample in the batched input is a 2D tensor input [ i , j ] \text{input}[i, j] ) of the input tensor).

dropout3d

Randomly zero out entire channels (a channel is a 3D feature map, e.g., the j j -th channel of the i i -th sample in the batched input is a 3D tensor input [ i , j ] \text{input}[i, j] ) of the input tensor).

Sparse functions

embedding

A simple lookup table that looks up embeddings in a fixed dictionary and size.

embedding_bag

Computes sums, means or maxes of bags of embeddings, without instantiating the intermediate embeddings.

one_hot

Takes LongTensor with index values of shape (*) and returns a tensor of shape (*, num_classes) that have zeros everywhere except where the index of last dimension matches the corresponding value of the input tensor, in which case it will be 1.

Distance functions

pairwise_distance

See torch.nn.PairwiseDistance for details

cosine_similarity

Returns cosine similarity between x1 and x2, computed along dim.

pdist

Computes the p-norm distance between every pair of row vectors in the input.

Loss functions

binary_cross_entropy

Function that measures the Binary Cross Entropy between the target and input probabilities.

binary_cross_entropy_with_logits

Function that measures Binary Cross Entropy between target and input logits.

poisson_nll_loss

Poisson negative log likelihood loss.

cosine_embedding_loss

See CosineEmbeddingLoss for details.

cross_entropy

This criterion computes the cross entropy loss between input logits and target.

ctc_loss

The Connectionist Temporal Classification loss.

gaussian_nll_loss

Gaussian negative log likelihood loss.

hinge_embedding_loss

See HingeEmbeddingLoss for details.

kl_div

The Kullback-Leibler divergence Loss

l1_loss

Function that takes the mean element-wise absolute value difference.

mse_loss

Measures the element-wise mean squared error.

margin_ranking_loss

See MarginRankingLoss for details.

multilabel_margin_loss

See MultiLabelMarginLoss for details.

multilabel_soft_margin_loss

See MultiLabelSoftMarginLoss for details.

multi_margin_loss

See MultiMarginLoss for details.

nll_loss

The negative log likelihood loss.

huber_loss

Function that uses a squared term if the absolute element-wise error falls below delta and a delta-scaled L1 term otherwise.

smooth_l1_loss

Function that uses a squared term if the absolute element-wise error falls below beta and an L1 term otherwise.

soft_margin_loss

See SoftMarginLoss for details.

triplet_margin_loss

See TripletMarginLoss for details

triplet_margin_with_distance_loss

See TripletMarginWithDistanceLoss for details.

Vision functions

pixel_shuffle

Rearranges elements in a tensor of shape ( , C × r 2 , H , W ) (*, C \times r^2, H, W) to a tensor of shape ( , C , H × r , W × r ) (*, C, H \times r, W \times r) , where r is the upscale_factor.

pixel_unshuffle

Reverses the PixelShuffle operation by rearranging elements in a tensor of shape ( , C , H × r , W × r ) (*, C, H \times r, W \times r) to a tensor of shape ( , C × r 2 , H , W ) (*, C \times r^2, H, W) , where r is the downscale_factor.

pad

Pads tensor.

interpolate

Down/up samples the input to either the given size or the given scale_factor

upsample

Upsamples the input to either the given size or the given scale_factor

upsample_nearest

Upsamples the input, using nearest neighbours' pixel values.

upsample_bilinear

Upsamples the input, using bilinear upsampling.

grid_sample

Given an input and a flow-field grid, computes the output using input values and pixel locations from grid.

affine_grid

Generates a 2D or 3D flow field (sampling grid), given a batch of affine matrices theta.

DataParallel functions (multi-GPU, distributed)

data_parallel

torch.nn.parallel.data_parallel

Evaluates module(input) in parallel across the GPUs given in device_ids.

© 2024, PyTorch Contributors
PyTorch has a BSD-style license, as found in the LICENSE file.
https://pytorch.org/docs/2.1/nn.functional.html