Search code examples
pythonvalidationpytorchpytorch-geometric

Split DBP15K from pytorch geometric in train test and validation


I have a code in which I use the DBP15K dataset via

from torch_geometric.datasets import DBP15K

data = DBP15K(path, args.category, transform=SumEmbedding())[0].to(device)

But according to the documentation of pytorch geometric this one is divided only in train and in test.

I tried to divide it by myself using the function "train_test_split_edges" .

But nothing I tried worked so I wanted to know if some of you already tried to split this dataset.


Solution

  • Finally I just need to split either the test or the train to have the validation.

    I just did it like this:

    data = DBP15K(path, args.category, transform=SumEmbedding())[0].to(device)
    # Divide the tensor into two parts with ratios 0.8 and 0.2
    split_index = int(0.8 * data.train_y.shape[1])
    train_y, val_y = torch.split(data.train_y, [split_index, data.train_y.shape[1] - split_index], dim=1)
    
    # Display tensor shapes
    print(train_y.shape)  # torch.Size([2, 3296])
    print(val_y.shape)    # torch.Size([2, 825])