weird that two tensors originating from the same source have different mean values

concatenating a list of 2d-tensor along different two axis respectively, leads to tensor A,B where A.T==B, but their mean values along the same axis is slightly different (A.T.mean(axis=0) != B.mean(axis=0)), why? Theoretically they are the same.

start from this 3d list whose shape is (10,2,1)

In [4]: tmp
Out[4]: 
[[[0.3471660912036896], [0.652833878993988]],
 [[0.5512792468070984], [0.4487205743789673]],
 [[0.5454527139663696], [0.4545471668243408]],
 [[0.3661797344684601], [0.6338202953338623]],
 [[0.2655346989631653], [0.7344651222229004]],
 [[0.28296780586242676], [0.717032253742218]],
 [[0.28441378474235535], [0.7155864238739014]],
 [[0.3660774230957031], [0.6339224576950073]],
 [[0.3515346944332123], [0.6484655141830444]],
 [[0.3660774230957031], [0.6339224576950073]]]

step 1, convert tmp into a list of tensors

In [7]: tensor_list = [torch.tensor(x) for x in tmp]
   ...: tensor_list
Out[7]: 
[tensor([[0.3472],
         [0.6528]]),
 tensor([[0.5513],
         [0.4487]]),
 tensor([[0.5455],
         [0.4545]]),
 tensor([[0.3662],
         [0.6338]]),
 tensor([[0.2655],
         [0.7345]]),
 tensor([[0.2830],
         [0.7170]]),
 tensor([[0.2844],
         [0.7156]]),
 tensor([[0.3661],
         [0.6339]]),
 tensor([[0.3515],
         [0.6485]]),
 tensor([[0.3661],
         [0.6339]])]

step 2
- torch.cat() over each tensor.T in tensor_list along axis 0 to get A.
- torch.cat() over each tensor in tensor_list along axis 1 and then transpose to get B.

where A.shape == B.shape

In [11]: A = torch.cat([x.T for x in tensor_list], dim=0)
    ...: A.shape
Out[11]: torch.Size([10, 2])

In [12]: B = torch.cat(tensor_list, dim=1).T
    ...: B.shape
Out[12]: torch.Size([10, 2])

step 3. check consistency between A and B. We can see below that the sum of their element-wise difference is zero, and the element-wise comparison show that each element is the same as that in the other tensor.

In [13]: (A - B).abs().sum()
Out[13]: tensor(0.)

In [14]: A == B
Out[14]: 
tensor([[True, True],
        [True, True],
        [True, True],
        [True, True],
        [True, True],
        [True, True],
        [True, True],
        [True, True],
        [True, True],
        [True, True]])

step 4. check their mean values along axis(0). Strange that there are slightly difference.

In [15]: A.mean(dim=0) - B.mean(dim=0)
Out[15]: tensor([5.9605e-08, 0.0000e+00])

Though the difference between their mean values is minor enough to neglect, I wonder why would this happen. How does torch.cat() works?

[Environment Info]

OS: Ubuntu 20.04.5 LTS
Python: Python 3.8.10
torch: 2.0.1
cuda: 11.7
NVIDIA-SMI 515.86.01

Solution

The tensors have the same shapes and values, but their construction results in them having different memory layouts. You can see this with the stride function:

tmp = torch.randn(10,2,1)
tensor_list = [x for x in tmp]
A = torch.cat([x.T for x in tensor_list], dim=0)
B = torch.cat(tensor_list, dim=1).T
print(A.stride())
> (2, 1)
print(B.stride())
> (1, 10)

The stride tells us how many bytes we need to move along an axis to get from one value to another.

Because the memory layouts are different, the mean operation processes values in a different order for each tensor. The different mean results come from the different operation order combined with numerical precision issues.

As a comparison, if you re-create A and B from lists (creating a whole new tensor), you get two tensors with the same stride and no mean difference.

C = torch.tensor(A.tolist())
D = torch.tensor(B.tolist())
print(C.stride())
> (2, 1)
print(D.stride())
> (2, 1)

C.mean(dim=0) - D.mean(dim=0)
> tensor([0., 0.])