I have .csv files in a folder for 10 different people. The files are named "data_m1", "data_m2", etc. I want to create a for loop to read all the files, process them by applying certain functions and creating new columns as features, then merge them to one file efficiently. In the process, I want to read the filename and add the column "name" and label the data according to filenames "m1", "m2", etc.
let's say I want to apply this simple process which creates a new column for each 10 files in the folder
df['new_column1']= df['value'].apply(lambda x: 1 if 0 < x <= 10 else 2 if 10 <x<20 else np.nan)
Then, I want to combine all files into one dataframe at the end but labeling them with name column by the extensions "m1" , "m2" ,etc.
You can try the following
import glob
import os
import pandas as pd
# create an empty list to append DataFrames
dfs = []
# use glob to get a list of file names and iterate
for file in glob.glob('/path/to/folder/data_m*.csv'):
# read the file
df = pd.read_csv(file)
# assign the name column based on the file name
df['name'] = os.path.splitext(file)[0].split('_', 1)[1]
# do more stuff here
# append df to the empty list
dfs.append(df)
# concat all your frames together
final_df = pd.concat(dfs)