What is T in nn.Linear equation?

I'm trying to understand the PyTorch's nn.Linear. I get that it applies a linear transformation to the input, but in the docs they specify the equation being used as y= xA^T + b.

This reminds my of y = (x[1] * w[1]) + (x[2] * w[2]) + b. Is that at all what's happening here?

Also, here is my current understanding of the variables in this equation, is this correct?

x = input A = weight (I think) T = Not sure b = bias

Solution

The T stands for the transpose operation. For reasons that aren't entirely clear, pytorch stores the transpose of the weight matrix for linear layers. You can find some discussion of this here, but it seems to be a legacy thing.

layer = nn.Linear(64, 128)
layer.weight.shape
> torch.Size([128, 64]) # we would expect (64, 128) but we get the transpose (128, 64)

x = torch.randn(8, 64) # random input

# nn.Linear computes `xA^T + b`
(([email protected]) + layer.bias == layer(x)).all()
> tensor(True)

Why does pytorch tensor.item() give an unprecise output to any real number input but give a precise output to a number that ends with .0 or .5?
Pytorch Geometric graph batching not using DataLoader for Reinforcement learning
How do I rotate a PyTorch image tensor around it's center in a way that supports autograd?
How to access the weights of a layer in pretrained efficientnet-b3 in torch?
Difference between Parameter vs. Tensor in PyTorch
Is it possible to set different activation functions for different outputs at the final layer in the neural net?
Finding rectangles enclosing points
How does multidimensional input to a nn.Linear layer work?
Multi dimensional inputs in pytorch Linear method?
Transformers: Cross Attention Tensor Shapes During Inference Mode
Poetry PyTorch dependency exclude cuda as I want to use the system cuda
Where do I get a CPU-only version of PyTorch?
What does `gather()` do in PyTorch in layman terms?
Implementation of torch.nn.Conv1d in C++
Human segmentation fails with Pytorch, not with Tensorflow Keras
Download location of Pytorch
Query padding mask and key padding mask in Transformer encoder
Masking and computing loss for a padded batch sent through an RNN with a linear output layer in pytorch
How to actually apply a Conv2d filter in Pytorch
Applying convolution operation to image - PyTorch
Algorithim of how Conv2d is implemented in PyTorch
Where is the source code of pytorch conv2d?
Obtaining the image iterations before final image has been generated StableDiffusionPipeline.pretrained
Install pytorch version 1.0.0 using pip or cuda in 2023
Why does nn.Linear(in_features, out_features) use a weight matrix of shape (out_features, in_features) in PyTorch?
How can I convert images to 1-bit tensor and use them for To reduce RAM and GPU usage and training in PyTorch?
PyTorch "Caught IndexError in DataLoader worker process 0", "IndexError: too many indices for array"
What does "unsqueeze" do in Pytorch?
pytorchvideo.transforms.RandAugment import error
Should nested modules with shared weights be an nn.Module object parameter or not?