Search code examples
pythonpandasdataframeopenpyxlpandas.excelwriter

KeyError: "Not all names specified in 'columns' are found"


I have a dataframe like as shown below

Date,cust,region,Abr,Number,,,dept
12/01/2010,Company_Name,Somecity,Chi,36,136,NaN,sales
12/02/2010,Company_Name,Someothercity,Nyc,156,NaN,41,mfg

tf = pd.read_clipboard(sep=',')

I am trying to do some manipulation to the excel file

writer = pd.ExcelWriter('duck_data.xlsx',engine='xlsxwriter')
for (cust,reg), v in df.groupby(['cust','region']):
    v.to_excel(writer, sheet_name=f"DATA_{cust}_{reg}",index=False, columns = modified_col_list)
writer.save()

but the problem is when the file is written, it uses unnamed:5, unnamed:6 to represent empty column names. So, I created a modified_col_list as shown below and passed it as input to to_excel function

ordiginal_col_list = ['Date','cust','region','Abr','Number',nan,nan,'dept']

modified_col_list = ['Date','cust','region','Abr','Number',' ',' ','dept']

But my objective is to have empty column names as is during excel write itself. But this resulted in below error

KeyError: "Not all names specified in 'columns' are found"

I expect my output to be like as shown below (you can see the column names are empty)

enter image description here


Solution

  • You can use:

    writer = pd.ExcelWriter('duck_data.xlsx',engine='xlsxwriter')
    for (cust,reg), v in df.groupby(['cust','region']):
        #if columns name has `Unnamed` replace by empty string
        #v.columns = ['' if 'Unnamed' in x else x for x in v.columns]
        #if columnshas missing values replace them to empty string
        v.columns = v.columns.to_series().fillna('')
        #removed columns parameter
        v.to_excel(writer, sheet_name=f"DATA_{cust}_{reg}",index=False)
    writer.save()
    

    Another idea is change columns names before groupby:

    #if columns name has `Unnamed` replace by empty string
    #df.columns = ['' if 'Unnamed' in x else x for x in df.columns]
    #if columnshas missing values replace them to empty string
    df.columns = df.columns.to_series().fillna('')
    
    writer = pd.ExcelWriter('duck_data.xlsx',engine='xlsxwriter')
    for (cust,reg), v in df.groupby(['cust','region']):
        #removed columns parameter
        v.to_excel(writer, sheet_name=f"DATA_{cust}_{reg}",index=False)
    writer.save()