matrixgpucpu-coresasic# Where does the third dimension (as in 4x4x4) of tensor cores come from?

As I understand, the Nvidia tensor cores multiplies two 4x4 matrices and adds the result to a third matrix. Multiplying two 4x4 matrices produces a 4x4 matrix, and adding two 4x4 matrices produces a 4x4 matrix. Still "Each Tensor Core provides a 4x4x4 matrix processing array".

There are 4x multiplication-accumulate operations that are needed for each row*col. I thought maybe the last x4 comes from intermediate result before the accumulation, but I don't think it quite fits with the description on Nvidias pages.

"The FP16 multiply results in a full precision result that is accumulated in FP32 operations with the other products in a given dot product for a 4x4x4 matrix multiply, as Figure 9 shows." https://developer.nvidia.com/blog/cuda-9-features-revealed/

4x4x4 matrix multiply? I thought matrices was 2dimensions by definition.

Can someone please explain where the last x4 comes from?

Solution

4x4x4 is just the notation for multiplication of one 4x4 matrix with another 4x4 matrix.

If you were to multiply a 4x8 matrix with a 8x4 matrix, you would have 4x8x4. So if A is NxK and B is KxM, then it can be referred to as a NxKxM matrix multiply.

I just briefly looked up and found this paper, where they use this exact notation (e.g. in Section 4.6 on page 36): https://www.research-collection.ethz.ch/bitstream/handle/20.500.11850/153863/eth-6705-01.pdf

- Print 2-D Array in clockwise expanding spiral from center
- Assign values based on reverse combination of two columns
- R - Reshape list with matrices
- How to split a square matrix into cubes without loops
- Pretty print 2D list?
- Why are quaternions used for rotations?
- Multiple values in one tile with geom_tile
- How to put number of a particular element in a particular row and column constraints in a matrix?
- How to calculate a Kernel/Matrix efficiently
- What's the best way to create a "3D identity matrix" in Numpy?
- How to compute exponential of a matrix inside CUDA thread?
- How to get element-wise matrix multiplication (Hadamard product) in numpy?
- Why does dim=1 return row indices in torch.argmax?
- How to unnest a "list" in a martix?
- Can not update matrix entries for large matrices
- How do I make a matrix from a list of vectors in R?
- Sum two array rows in PHP on a single line
- Get indices of matrix from upper triangle without numpy
- How to read scalar, vector and matrix information in string format resembling Python syntax in R?
- Cache-friendly sqare matrix transposition logic issue
- How to select winner of condorcet election via matrix?
- solve rectangular matrix in python to get solution with arbitrary parameters
- Sum of a list of matrices in R
- Submatrix Ocaml
- Parameterizing type definition at compile time
- How to make a symmetric matrix where the sum of each row and column is k when the diagonal elements are certain (not zero) in excel?
- MPI4PY: Scatter a matrix
- Fill the Diagonal of Each Matrix in a 3D numpy Array with a Vector
- Rotate UVs in Vertex Shader without distorting texture
- Why do I divide Z by W in a perspective projection in OpenGL?