Search code examples
pythonexcelpandasdataframenaming

Naming excel files using column values + aliases for those values in python


The DF looks something like this and extends for thousands of rows (i.e every combination of 'Type' & 'Name' possible)

| total |  big  |  med  | small|   Type   |   Name   |
|:-----:|:-----:|:-----:|:----:|:--------:|:--------:| 
|   5   |   4   |   0   |   1  |   Pig    |   John   |
|   6   |   0   |   3   |   3  |  Horse   |   Mike   | 
|   5   |   2   |   3   |   0  |   Cow    |   Rick   |
|   5   |   2   |   3   |   0  |   Horse  |   Rick   |
|   5   |   2   |   3   |   0  |   Cow    |   John   |
|   5   |   2   |   3   |   0  |   Pig    |   Mike   |

I have grouped the dataframe by 'Type' and 'Name'.

| total |  big  |  med  | small|   Type   |   Name   |
|:-----:|:-----:|:-----:|:----:|:--------:|:--------:| 
|   5   |   4   |   0   |   1  |   Pig    |   John   |
|   6   |   0   |   3   |   3  |   Pig    |   John   | 
|   5   |   2   |   3   |   0  |   Pig    |   John   |
|   5   |   2   |   3   |   0  |   Pig    |   John   |

Then run functions on each grouped dataframe respectively.

for idx, df in data.groupby(['Type', 'Name']):
     function_1(df)
     function_2(df)

    with pd.ExcelWriter(f"{'_'.join(idx)}.xlsx") as writer:
        table_1.to_excel(writer, sheet_name='Table 1', index=False)
        table_2.to_excel(writer, sheet_name='Table 2', index=False)

The resulting file name comes out:

"Pig_John.xlsx"

I'd like to add aliases to replace each 'Type' and 'Name' respectively as seen below.

Aliases: 

Pig = Type1
Horse = Type2
Cow = Type3
John = Name1
Mike = Name2
Rick = Name3

Example Result:

Pig_John.xlsx = Type1_Name1.xlsx
Horse_Rick.xlsx = Type2_Name3.xlsx

Solution

  • You can create a dictionary, and then call the keys and values of the dictionary, creating a new idx with each loop with idx = (dct[idx[0]], dct[idx[1]]):

    dct = {'Pig' : 'Type1',
    'Horse' : 'Type2',
    'Cow' : 'Type3',
    'John' : 'Name1',
    'Mike' : 'Name2',
    'Rick' : 'Name3'}
    
    df=d.copy()
    for idx, d in df.groupby(['Type', 'Name']):
        idx = (dct[idx[0]], dct[idx[1]])
        print(f"{'_'.join(idx)}.xlsx")
    
    Out[1]:
    Type3_Name1.xlsx
    Type3_Name3.xlsx
    Type2_Name2.xlsx
    Type2_Name3.xlsx
    Type1_Name1.xlsx
    Type1_Name2.xlsx