Search code examples
pythonpytorchnetworkxpytorch-geometric

How to load in graph from networkx into PyTorch geometric and set node features and labels?


Goal: I am trying to import a graph FROM networkx into PyTorch geometric and set labels and node features.

(This is in Python)

Question(s):

  1. How do I do this [the conversion from networkx to PyTorch geometric]? (presumably by using the from_networkx function)
  2. How do I transfer over node features and labels? (more important question)

I have seen some other/previous posts with this question but they weren't answered (correct me if I am wrong).

Attempt: (I have just used an unrealistic example below, as I cannot post anything real on here)

Let us imagine we are trying to do a graph learning task (e.g. node classification) on a group of cars (not very realistic as I said). That is, we have a group of cars, an adjacency matrix, and some features (e.g. price at the end of the year). We want to predict the node label (i.e. brand of the car).

I will be using the following adjacency matrix: (apologies, cannot use latex to format this)

A = [(0, 1, 0, 1, 1), (1, 0, 1, 1, 0), (0, 1, 0, 0, 1), (1, 1, 0, 0, 0), (1, 0, 1, 0, 0)]

Here is the code (for Google Colab environment):

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import networkx as nx
from torch_geometric.utils.convert import to_networkx, from_networkx
import torch

!pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.10.0+cpu.html

# Make the networkx graph
G = nx.Graph()

# Add some cars (just do 4 for now)
G.add_nodes_from([
      (1, {'Brand': 'Ford'}),
      (2, {'Brand': 'Audi'}),
      (3, {'Brand': 'BMW'}),
      (4, {'Brand': 'Peugot'}),
      (5, {'Brand': 'Lexus'}),
])

# Add some edges
G.add_edges_from([
                  (1, 2), (1, 4), (1, 5),
                  (2, 3), (2, 4),
                  (3, 2), (3, 5), 
                  (4, 1), (4, 2),
                  (5, 1), (5, 3)
])

# Convert the graph into PyTorch geometric
pyg_graph = from_networkx(G)

So this correctly converts the networkx graph to PyTorch Geometric. However, I still don't know how to properly set the labels.

The brand values for each node have been converted and are stored within:

pyg_graph.Brand

Below, I have just made some random numpy arrays of length 5 for each node (just pretend that these are realistic).

ford_prices = np.random.randint(100, size = 5)
lexus_prices = np.random.randint(100, size = 5)
audi_prices = np.random.randint(100, size = 5)
bmw_prices = np.random.randint(100, size = 5)
peugot_prices = np.random.randint(100, size = 5)

This brings me to the main question:

  • How do I set the prices to be the node features of this graph?
  • How do I set the labels of the nodes? (and will I need to remove the labels from pyg_graph.Brand when training the network?)

Thanks in advance and happy holidays.


Solution

  • The easiest way is to add all information to the networkx graph and directly create it in the way you need it. I guess you want to use some Graph Neural Networks. Then you want to have something like below.

    1. Instead of text as labels, you probably want to have a categorial representation, e.g. 1 stands for Ford.
    2. If you want to match the "usual convention". Then you name your input features x and your labels/ground truth y.
    3. The splitting of the data into train and test is done via mask. So the graph still contains all information, but only part of it is used for training. Check the PyTorch Geometric introduction for an example, which uses the Cora dataset.
    import networkx as nx
    import numpy as np
    import torch
    from torch_geometric.utils.convert import from_networkx
    
    
    # Make the networkx graph
    G = nx.Graph()
    
    # Add some cars (just do 4 for now)
    G.add_nodes_from([
          (1, {'y': 1, 'x': 0.5}),
          (2, {'y': 2, 'x': 0.2}),
          (3, {'y': 3, 'x': 0.3}),
          (4, {'y': 4, 'x': 0.1}),
          (5, {'y': 5, 'x': 0.2}),
    ])
    
    # Add some edges
    G.add_edges_from([
                      (1, 2), (1, 4), (1, 5),
                      (2, 3), (2, 4),
                      (3, 2), (3, 5),
                      (4, 1), (4, 2),
                      (5, 1), (5, 3)
    ])
    
    # Convert the graph into PyTorch geometric
    pyg_graph = from_networkx(G)
    
    print(pyg_graph)
    # Data(edge_index=[2, 12], x=[5], y=[5])
    print(pyg_graph.x)
    # tensor([0.5000, 0.2000, 0.3000, 0.1000, 0.2000])
    print(pyg_graph.y)
    # tensor([1, 2, 3, 4, 5])
    print(pyg_graph.edge_index)
    # tensor([[0, 0, 0, 1, 1, 1, 2, 2, 3, 3, 4, 4],
    #         [1, 3, 4, 0, 2, 3, 1, 4, 0, 1, 0, 2]])
    
    
    # Split the data 
    train_ratio = 0.2
    num_nodes = pyg_graph.x.shape[0]
    num_train = int(num_nodes * train_ratio)
    idx = [i for i in range(num_nodes)]
    
    np.random.shuffle(idx)
    train_mask = torch.full_like(pyg_graph.y, False, dtype=bool)
    train_mask[idx[:num_train]] = True
    test_mask = torch.full_like(pyg_graph.y, False, dtype=bool)
    test_mask[idx[num_train:]] = True
    
    print(train_mask)
    # tensor([ True, False, False, False, False])
    print(test_mask)
    # tensor([False,  True,  True,  True,  True])