Search code examples
deep-learningconv-neural-networkterminology

What are the differences between aggregation and concatenation in convolutional neural networks?


When I read some classical papers about CNNs, like Inception family, ResNet, VGGnet and so on, I notice the terminology concatenation, summation and aggregation, which makes me confused(summation is easy to understand for me). Could someone tell me what the differences are among them? Maybe in a more sepcific way, like using examples to illustrate the dimensionality and representation ability differences.


Solution

    • Concatenation generally consists of taking 2 or more output tensors from different network layers and concatenating them along the channel dimension
    • Aggregation consists in taking 2 or more output tensors from different network layers and applying a chosen multivariate function on them to aggregate the results
    • Summation is a special case of aggregation where the function is a sum

    This implies that we lose information by doing aggregation. On the other hand, concatenation will make it possible to retain information at the cost of greater memory usage.

    E.g. in PyTorch:

    import torch
    
    batch_size = 8
    num_channels = 3
    h, w = 512, 512
    t1 = torch.rand(batch_size, num_channels, h, w) # A tensor with shape [8, 3, 512, 512]
    t2 = torch.rand(batch_size, num_channels, h, w) # A tensor with shape [8, 3, 512, 512]
    
    torch.cat((t1, t2), dim=1) # A tensor with shape [8, 6, 512, 512]
    t1 + t2 # A tensor with shape [8, 3, 512, 512]