deep-learning conv-neural-network terminology

What are the differences between aggregation and concatenation in convolutional neural networks？

When I read some classical papers about CNNs, like Inception family, ResNet, VGGnet and so on, I notice the terminology concatenation, summation and aggregation, which makes me confused(summation is easy to understand for me). Could someone tell me what the differences are among them? Maybe in a more sepcific way, like using examples to illustrate the dimensionality and representation ability differences.

Solution

Concatenation generally consists of taking 2 or more output tensors from different network layers and concatenating them along the channel dimension
Aggregation consists in taking 2 or more output tensors from different network layers and applying a chosen multivariate function on them to aggregate the results
Summation is a special case of aggregation where the function is a sum

This implies that we lose information by doing aggregation. On the other hand, concatenation will make it possible to retain information at the cost of greater memory usage.

E.g. in PyTorch:

import torch

batch_size = 8
num_channels = 3
h, w = 512, 512
t1 = torch.rand(batch_size, num_channels, h, w) # A tensor with shape [8, 3, 512, 512]
t2 = torch.rand(batch_size, num_channels, h, w) # A tensor with shape [8, 3, 512, 512]

torch.cat((t1, t2), dim=1) # A tensor with shape [8, 6, 512, 512]
t1 + t2 # A tensor with shape [8, 3, 512, 512]