Search code examples
pandasnumpypytorchtensor

How do I convert a "subset" into slices/tensor/matrix that I obtained using "torch.utils.data.random_split()" in PyTorch?


I'm trying to divide my data into training and testing datasets in PyTorch using torch.utils.data.random_split().

train, test = torch.utils.data.random_split(iris, [112, 38], generator=torch.Generator().manual_seed(42))

The output for the train and set sets from the code above is: <torch.utils.data.dataset.Subset at 0x141a7379b50>

I'm trying to assign the values below to my training and test sets using the code below, although I'm getting: "TypeError: list indices must be integers or slices, not str" error.

train_X = train[['Sepal.Length', 'Sepal.Width', 'Petal.Length',
                 'Petal.Width']]
train_y = train.Species

test_X = test[['Sepal.Length', 'Sepal.Width', 'Petal.Length',
                 'Petal.Width']]
test_y = test.Species

I looked through multiple posts on stackoverflow but couldn't find a proper solution.


Solution

  • I am using the following extremely hacky solution:

    test = torch.stack([t for t in test])
    

    Where test is now a tensor of length 38. When you call:

    train, test = torch.utils.data.random_split(iris, [112, 38], generator=torch.Generator().manual_seed(42))
    

    Then train and test are Subsets (as you noted), which are kind of fancy generators. You can evaluate the generator with the list comprehension. This is not good coding, but it works in a pinch.