Search code examples
pythonpytorchconv-neural-networkecgi

How to Split a folder with multiple dataset into train and test using PyTorch


I have a folder with 48 ECG signal files. The files include .dat and .atr ECG signal records and annotation. I want to split them to train and test to train the AI model. I will be using PyTorch and I want to know a simple way to do this in Python.I prefer a custom split with certain number of files to be in train and the rest in test.

Eg: Train : ['101', '104','107'] Test : ['102', '105','106']

Thanks


Solution

  • Here first you need to store the Input and attribute location using
    a dictionary in python with Input file name as key and Attribute file name as Value.

    Then you can split the key of the dictionary and use that as input.

    from glob import glob
    
    MainFolder="<Your Folder Name>"
    
    Data={}
    for file in glob(MainFolder+"/*.dat"):
       At_file=file[:-3]+"atr"
       Data[file]=At_file
    
    # Here Data would have Input and attribute file name as key and value pair
    
    # To split the date: 
    
    Key_data=list(Data)
    import random
    random.shuffle(Key_data)
    
    #Here you specify the split ratio of Training and Testing
    split=int(len(Key_data)*(0.8))
    
    Train_in=Key_data[:split]
    Test_in=Key_data[split:]
    Train_at=[Data[i] for i in Train_in]
    Test_at=[Data[i] for i in Test_in]
    
    print(Train_in,Train_at,Test_in,Test_at)
    

    Here Train_in is the Input files and Train_at is its corresponding attribute files

    This should solve your problem. Comment if you get any error in implementing the above code.