This example will make it easier to understand. The following fails:
A = tensor.torch([[1, 2, 3], [4, 5, 6]]) # shape : (2, 3)
B = tensor.torch([[1, 2], [3, 4], [5, 6]]) # shape : (3, 2)
print((A - B).shape)
# RuntimeError: The size of tensor A (3) must match the size of tensor B (2) at non-singleton dimension 1
# ==================================================================
A = tensor.torch([[1, 2], [3, 4], [5, 6]]) # shape : (3, 2)
B = tensor.torch([[1, 2], [3, 4],]) # shape : (2, 2)
print((A - B).shape)
# RuntimeError: The size of tensor A (3) must match the size of tensor B (2) at non-singleton dimension 0
But the following works well:
a = torch.ones(8).unsqueeze(0).unsqueeze(-1).expand(4, 8, 7)
a_temp = a.unsqueeze(2) # shape : ( 4, 8, 1, 7 )
b_temp = torch.transpose(a_temp, 1, 2) # shape : ( 4, 1, 8, 7 )
print(a_temp-b_temp) # shape : ( 4, 8, 8, 7 )
Why does the latter work, but not the former?
How/why has the result shape been expanded?
This is well explained by the broadcasting semantics. The important part is :
Two tensors are “broadcastable” if the following rules hold:
In your case, (3,2) and (2,3) cannot be broadcast to a common shape (3 != 2 and neither are equal to 1), but (4,8,1,7), (4,1,8,7) and (4,8,8,7) are broadcast compatible.
This is basically what the error states : all dimensions must be either equal ("match") or singletons (i.e equal to 1)
What happens when the shape are broadcasted is basically a tensor expansion to make the shape match (expand to [4,8,8,7]), and then perform the subtraction as usual. Expansion duplicates your data (in a smart way) to reach the required shape.