Search code examples
pythonlstmtf.keras

How to deal with dataset containing multiple csv files?


I'm implementig an LSTM but i have problem of dataset. My dataset is in the form of multiple CSV files(different problem instances) I have more than 100 CSV files in a directory that I want to read and load them in python. My question is how I should proceed to build a dataset for training and testing. Is there a way to split each csv file into two parts (80% training and 20% testing) then grouping the 80% of each as data for training and grouping the 20% for testing. or is there another more efficient way of doing things How do i take these multiple CSVs as input to train and tet the LSTM? this is a part of my csv file structure CSV file structure and this one a screen of my csvs files (problems instances)csvs files


Solution

  • You can use pandas pd.concat() to combine multiple dataframes with the same columns (pandas docs).

    You can iterate through that directory to create a list of csv file names, read each csv using pd.read_csv(), and then concatenate into a final dataframe with something like this:

    final_df=pd.DataFrame(columns=[<YOUR COLUMNS>])
    for csv_path in csv_files_list:
        df=pd.read_csv(csv_path)
        final_df=pd.concat(final_df, df)
    

    From here, you can split your training and test data using sklearn or whatever other method you like.