Search code examples
pythonpandasiterationglob

How do use python to iterate through a directory and delete specific columns from all csvs?


I have a directory with several csvs.

files = glob('C:/Users/jj/Desktop/Bulk_Wav/*.csv')

Each csv has the same below columns. Reprex below-

yes no maybe ofcourse
1   2  3     4

I want my script to iterate through all csvs in the folder and delete the columns maybe and ofcourse.


Solution

  • If glob provides you with file paths, you can do the following with pandas:

    import pandas as pd
    
    files = glob('C:/Users/jj/Desktop/Bulk_Wav/*.csv')
    drop = ['maybe ', 'ofcourse']
    
    for file in files:
        df = pd.read_csv(file)
        for col in drop:
            if col in df:
                df = df.drop(col, axis=1)
        df.to_csv(file)
    

    Alternatively if you want a cleaner way to not get KeyErrors from drop you can do this:

    import pandas as pd
    
    files = glob('C:/Users/jj/Desktop/Bulk_Wav/*.csv')
    drop = ['maybe ', 'ofcourse']
    
    for file in files:
        df = pd.read_csv(file)
        df = df.drop([c for c in drop if c in df], axis=1)
        df.to_csv(file)