Search code examples
pythonpandasfor-loopkeyerror

Pandas dataframe KeyError while working with CSV files


I have this code as shown:

for filename in glob.glob('/Users/jacob/Desktop/MERS/new/NOT COAL/gensets/statistics_per_lgu/per_lgu_files/*.csv'):


    # For fuel consumption
    count = df_csv['Fuel Type_Jundy'].count()
    aa = df_csv['Fuel Type_Jundy']
    d = aa.value_counts()

    ADO = d['ADO']
    Bunker = d['Bunker']
    LSFO = d['LSFO']
    IFO = d['IFO']
    LPG = d['LPG']


    fuel_type = pd.DataFrame({'count': count, 'ADO':ADO, 'Bunker':Bunker, 'LSFO':LSFO, 'IFO':IFO, 'LPG':LPG},
                             index=['fuel_type'])

A KeyError occurs since, not all csv files contain 'ADO', 'Bunker', LSFO, etc. at the same time.

What can I do so that I could get this dataframe

  fuel_type = pd.DataFrame({'count': count, 'ADO':ADO, 'Bunker':Bunker, 'LSFO':LSFO, 'IFO':IFO, 'LPG':LPG},
                         index=['fuel_type'])

such that whatever is in *.csv (be it LSFO, ADO, Bunker, etc.) has its count in the dataframe.

Thanks! :D


Solution

  • there might be a cleaner/shorter way to do this, but you could 'try' to assign the value counts to a variable individually, and if it doesn't exist, then save it as NaN

    import numpy as np
    try:
        ADO = d['ADO']
    except:
        ADO = np.nan
    try:
        Bunker = d['Bunker']
    except:
        Bunker = np.nan
    

    etc...

    that way the code will run even if there isn't an entry in the csv file, and when you make the df the missing values will just be NaN and the values that are there will be correctly stored