Search code examples
time-seriesautoencoderpytorch-geometricgnn

pytorch geometric data - split into positive and negative train/test edges using a timestamp


I'm using pytorch geometric. My data is of the class: torch_geometric.data.Data. Most tutorials I see use torch_geometric.utils.train_test_split_edges (depreciated now, recommended to use torch_geometric.transforms.random_link_split. Any way, both of these functions work to split my data. However, my data has a time component and I'd like to do a train/test split using a date as a threshold. How can I accomplish this?

My data object looks like:

Data(x=[17815, 13], edge_index=[2, 62393], edge_attr=[62393], edge_time=[62393], edge_label=[62393], input_id=[1], batch_size=1)

I can get my own train_mask and test_mask by doing something like:

train_mask = (data.edge_time < time_threshold)
test_mask = (data.edge_time >= time_threshold)

But again this would take some work to filter all the components of Data and it does not have negative edge indices. My model needs positive and negative edge indices just like torch_geometric.utils.train_test_split_edges returns.

Does anyone know how to accomplish this? Thanks so much!!


Solution

  • You can in theory simply use the node mask to generate a train and test edge_index tensor:

    edge_index_train = data.edge_index[:, train_mask]
    edge_attr_train = data.edge_index[train_mask]
    

    and respectively replace train_mask with ~train_mask (or test_mask) for the test dataset.