Why we sum the output of different layers in a neural network?

I am reading a NN from here and I don't understand the forward pass complitely. Why at some point we are doing x = F.relu(x1) + F.relu(x2) + F.relu(x3)? It seems that the input of the linear layer lin1 is the sum(!!!) of the previous 3 layers. This seems quite strange to me, as I expect that a layer is fed only the output of the previous layer.

def forward(self, data):
    x, edge_index, batch = data.x, data.edge_index, data.batch
    edge_attr = None

    x = F.relu(self.conv1(x, edge_index, edge_attr))
    x, edge_index, edge_attr, batch = self.pool1(x, edge_index, edge_attr, batch)
    x1 = torch.cat([gmp(x, batch), gap(x, batch)], dim=1)

    x = F.relu(self.conv2(x, edge_index, edge_attr))
    x, edge_index, edge_attr, batch = self.pool2(x, edge_index, edge_attr, batch)
    x2 = torch.cat([gmp(x, batch), gap(x, batch)], dim=1)

    x = F.relu(self.conv3(x, edge_index, edge_attr))
    x3 = torch.cat([gmp(x, batch), gap(x, batch)], dim=1)

    x = F.relu(x1) + F.relu(x2) + F.relu(x3)

    x = F.relu(self.lin1(x))
    x = F.dropout(x, p=self.dropout_ratio, training=self.training)
    x = F.relu(self.lin2(x))
    x = F.dropout(x, p=self.dropout_ratio, training=self.training)
    x = F.log_softmax(self.lin3(x), dim=-1)

return x

Solution

The sum of the outputs from different layers is used to create a combined representation of the input data before passing it through subsequent layers. This technique is a way to enrich the model's understanding of the data by fusing information from multiple layers, which can be particularly useful in hierarchical or graph-based models where different layers may capture different structural or contextual information.