I'm implementig an LSTM but i have problem of dataset. My dataset is in the form of multiple CSV files(different problem instances) I have more than 100 CSV files in a directory that I want to read and load them in python. My question is how I should proceed to build a dataset for training and testing. Is there a way to split each csv file into two parts (80% training and 20% testing) then grouping the 80% of each as data for training and grouping the 20% for testing. or is there another more efficient way of doing things How do i take these multiple CSVs as input to train and tet the LSTM? this is a part of my csv file structure CSV file structure and this one a screen of my csvs files (problems instances)csvs files
You can use pandas pd.concat()
to combine multiple dataframes with the same columns (pandas docs).
You can iterate through that directory to create a list of csv file names, read each csv using pd.read_csv()
, and then concatenate into a final dataframe with something like this:
final_df=pd.DataFrame(columns=[<YOUR COLUMNS>])
for csv_path in csv_files_list:
df=pd.read_csv(csv_path)
final_df=pd.concat(final_df, df)
From here, you can split your training and test data using sklearn or whatever other method you like.