On this page
torch.nn.functional
Convolution functions
conv1d 
Applies a 1D convolution over an input signal composed of several input planes. 
conv2d 
Applies a 2D convolution over an input image composed of several input planes. 
conv3d 
Applies a 3D convolution over an input image composed of several input planes. 
conv_transpose1d 
Applies a 1D transposed convolution operator over an input signal composed of several input planes, sometimes also called "deconvolution". 
conv_transpose2d 
Applies a 2D transposed convolution operator over an input image composed of several input planes, sometimes also called "deconvolution". 
conv_transpose3d 
Applies a 3D transposed convolution operator over an input image composed of several input planes, sometimes also called "deconvolution" 
unfold 
Extracts sliding local blocks from a batched input tensor. 
fold 
Combines an array of sliding local blocks into a large containing tensor. 
Pooling functions
avg_pool1d 
Applies a 1D average pooling over an input signal composed of several input planes. 
avg_pool2d 
Applies 2D averagepooling operation in $kH \times kW$ regions by step size $sH \times sW$ steps. 
avg_pool3d 
Applies 3D averagepooling operation in $kT \times kH \times kW$ regions by step size $sT \times sH \times sW$ steps. 
max_pool1d 
Applies a 1D max pooling over an input signal composed of several input planes. 
max_pool2d 
Applies a 2D max pooling over an input signal composed of several input planes. 
max_pool3d 
Applies a 3D max pooling over an input signal composed of several input planes. 
max_unpool1d 
Computes a partial inverse of 
max_unpool2d 
Computes a partial inverse of 
max_unpool3d 
Computes a partial inverse of 
lp_pool1d 
Applies a 1D poweraverage pooling over an input signal composed of several input planes. 
lp_pool2d 
Applies a 2D poweraverage pooling over an input signal composed of several input planes. 
adaptive_max_pool1d 
Applies a 1D adaptive max pooling over an input signal composed of several input planes. 
adaptive_max_pool2d 
Applies a 2D adaptive max pooling over an input signal composed of several input planes. 
adaptive_max_pool3d 
Applies a 3D adaptive max pooling over an input signal composed of several input planes. 
adaptive_avg_pool1d 
Applies a 1D adaptive average pooling over an input signal composed of several input planes. 
adaptive_avg_pool2d 
Applies a 2D adaptive average pooling over an input signal composed of several input planes. 
adaptive_avg_pool3d 
Applies a 3D adaptive average pooling over an input signal composed of several input planes. 
fractional_max_pool2d 
Applies 2D fractional max pooling over an input signal composed of several input planes. 
fractional_max_pool3d 
Applies 3D fractional max pooling over an input signal composed of several input planes. 
Attention Mechanisms
scaled_dot_product_attention 
Computes scaled dot product attention on query, key and value tensors, using an optional attention mask if passed, and applying dropout if a probability greater than 0.0 is specified. 
Nonlinear activation functions
threshold 
Thresholds each element of the input Tensor. 
threshold_ 
Inplace version of 
relu 
Applies the rectified linear unit function elementwise. 
relu_ 
Inplace version of 
hardtanh 
Applies the HardTanh function elementwise. 
hardtanh_ 
Inplace version of 
hardswish 
Applies the hardswish function, elementwise, as described in the paper: 
relu6 
Applies the elementwise function $\text{ReLU6}(x) = \min(\max(0,x), 6)$. 
elu 
Applies the Exponential Linear Unit (ELU) function elementwise. 
elu_ 
Inplace version of 
selu 
Applies elementwise, $\text{SELU}(x) = scale * (\max(0,x) + \min(0, \alpha * (\exp(x)  1)))$, with $\alpha=1.6732632423543772848170429916717$ and $scale=1.0507009873554804934193349852946$. 
celu 
Applies elementwise, $\text{CELU}(x) = \max(0,x) + \min(0, \alpha * (\exp(x/\alpha)  1))$. 
leaky_relu 
Applies elementwise, $\text{LeakyReLU}(x) = \max(0, x) + \text{negative\_slope} * \min(0, x)$ 
leaky_relu_ 
Inplace version of 
prelu 
Applies elementwise the function $\text{PReLU}(x) = \max(0,x) + \text{weight} * \min(0,x)$ where weight is a learnable parameter. 
rrelu 
Randomized leaky ReLU. 
rrelu_ 
Inplace version of 
glu 
The gated linear unit. 
gelu 
When the approximate argument is 'none', it applies elementwise the function $\text{GELU}(x) = x * \Phi(x)$ 
logsigmoid 
Applies elementwise $\text{LogSigmoid}(x_i) = \log \left(\frac{1}{1 + \exp(x_i)}\right)$ 
hardshrink 
Applies the hard shrinkage function elementwise 
tanhshrink 
Applies elementwise, $\text{Tanhshrink}(x) = x  \text{Tanh}(x)$ 
softsign 
Applies elementwise, the function $\text{SoftSign}(x) = \frac{x}{1 + x}$ 
softplus 
Applies elementwise, the function $\text{Softplus}(x) = \frac{1}{\beta} * \log(1 + \exp(\beta * x))$. 
softmin 
Applies a softmin function. 
softmax 
Applies a softmax function. 
softshrink 
Applies the soft shrinkage function elementwise 
gumbel_softmax 
Samples from the GumbelSoftmax distribution (Link 1 Link 2) and optionally discretizes. 
log_softmax 
Applies a softmax followed by a logarithm. 
tanh 
Applies elementwise, $\text{Tanh}(x) = \tanh(x) = \frac{\exp(x)  \exp(x)}{\exp(x) + \exp(x)}$ 
sigmoid 
Applies the elementwise function $\text{Sigmoid}(x) = \frac{1}{1 + \exp(x)}$ 
hardsigmoid 
Applies the elementwise function 
silu 
Applies the Sigmoid Linear Unit (SiLU) function, elementwise. 
mish 
Applies the Mish function, elementwise. 
batch_norm 
Applies Batch Normalization for each channel across a batch of data. 
group_norm 
Applies Group Normalization for last certain number of dimensions. 
instance_norm 
Applies Instance Normalization for each channel in each data sample in a batch. 
layer_norm 
Applies Layer Normalization for last certain number of dimensions. 
local_response_norm 
Applies local response normalization over an input signal composed of several input planes, where channels occupy the second dimension. 
normalize 
Performs $L_p$ normalization of inputs over specified dimension. 
Linear functions
linear 
Applies a linear transformation to the incoming data: $y = xA^T + b$. 
bilinear 
Applies a bilinear transformation to the incoming data: $y = x_1^T A x_2 + b$ 
Dropout functions
dropout 
During training, randomly zeroes some of the elements of the input tensor with probability 
alpha_dropout 
Applies alpha dropout to the input. 
feature_alpha_dropout 
Randomly masks out entire channels (a channel is a feature map, e.g. 
dropout1d 
Randomly zero out entire channels (a channel is a 1D feature map, e.g., the $j$th channel of the $i$th sample in the batched input is a 1D tensor $\text{input}[i, j]$) of the input tensor). 
dropout2d 
Randomly zero out entire channels (a channel is a 2D feature map, e.g., the $j$th channel of the $i$th sample in the batched input is a 2D tensor $\text{input}[i, j]$) of the input tensor). 
dropout3d 
Randomly zero out entire channels (a channel is a 3D feature map, e.g., the $j$th channel of the $i$th sample in the batched input is a 3D tensor $\text{input}[i, j]$) of the input tensor). 
Sparse functions
embedding 
A simple lookup table that looks up embeddings in a fixed dictionary and size. 
embedding_bag 
Computes sums, means or maxes of 
one_hot 
Takes LongTensor with index values of shape 
Distance functions
pairwise_distance 
See 
cosine_similarity 
Returns cosine similarity between 
pdist 
Computes the pnorm distance between every pair of row vectors in the input. 
Loss functions
binary_cross_entropy 
Function that measures the Binary Cross Entropy between the target and input probabilities. 
binary_cross_entropy_with_logits 
Function that measures Binary Cross Entropy between target and input logits. 
poisson_nll_loss 
Poisson negative log likelihood loss. 
cosine_embedding_loss 
See 
cross_entropy 
This criterion computes the cross entropy loss between input logits and target. 
ctc_loss 
The Connectionist Temporal Classification loss. 
gaussian_nll_loss 
Gaussian negative log likelihood loss. 
hinge_embedding_loss 
See 
kl_div 

l1_loss 
Function that takes the mean elementwise absolute value difference. 
mse_loss 
Measures the elementwise mean squared error. 
margin_ranking_loss 
See 
multilabel_margin_loss 
See 
multilabel_soft_margin_loss 
See 
multi_margin_loss 
See 
nll_loss 
The negative log likelihood loss. 
huber_loss 
Function that uses a squared term if the absolute elementwise error falls below delta and a deltascaled L1 term otherwise. 
smooth_l1_loss 
Function that uses a squared term if the absolute elementwise error falls below beta and an L1 term otherwise. 
soft_margin_loss 
See 
triplet_margin_loss 
See 
triplet_margin_with_distance_loss 
See 
Vision functions
pixel_shuffle 
Rearranges elements in a tensor of shape $(*, C \times r^2, H, W)$ to a tensor of shape $(*, C, H \times r, W \times r)$, where r is the 
pixel_unshuffle 
Reverses the 
pad 
Pads tensor. 
interpolate 
Down/up samples the input to either the given 
upsample 
Upsamples the input to either the given 
upsample_nearest 
Upsamples the input, using nearest neighbours' pixel values. 
upsample_bilinear 
Upsamples the input, using bilinear upsampling. 
grid_sample 
Given an 
affine_grid 
Generates a 2D or 3D flow field (sampling grid), given a batch of affine matrices 
DataParallel functions (multiGPU, distributed)
data_parallel

Evaluates module(input) in parallel across the GPUs given in device_ids. 
© 2024, PyTorch Contributors
PyTorch has a BSDstyle license, as found in the LICENSE file.
https://pytorch.org/docs/2.1/nn.functional.html