Search code examples
pythonpandasfor-loopfilenamesexport-to-excel

How to write to_excel dynamic filename in for loop based on groupby field in PANDAS?


I have a dataset of schools in each state. I want to group the schools by state, run some calculations to create a ranking, and then export each ranking to separate .xlsx files named "state.xlsx". For example, AK school data into ranking_alaska.xlsx, TX schools into ranking_texas.xlsx, etc.

Example data here: https://docs.google.com/spreadsheets/d/1-wdmIz_-AILcBqzvpwAFGZfXqhq8oDRrYFVVdkjZ10o/edit?usp=sharing

I've tried concatenating the state field onto the filename, but I get TypeError: can only concatenate str (not "tuple") to str. When I run the code with a static file name, it creates a file just for the state in the first row of data.


df = pd.read_excel("ranker_test.xlsx", sheet_name='DATA')

grouped = df.groupby('state')

for x in grouped:
    df.to_excel('test files/ranking_' + x + '.xlsx', index=False)

I'd like to add the rest of the code into the for loop and make the file names dynamic so that 50 state ranking excel files are produced when ran, each named after the respective state.

EDIT: I've used Reedinator's input to create the file names, but having trouble exporting each group to excel. It either returns the whole dataframe of all groups to each file or fails to execute. This is my code:

for group in grouped:
    group.to_frame().to_excel('test files/ranking_{}.xlsx'.format(group[0]), index=False)

But I get AttributeError: 'tuple' object has no attribute 'to_frame'

How can I convert the tuple into a DataFrame to send to excel?


Solution

  • The error is telling you a very straightforward explanation of why it is not working -> x is a tuple and not a string! Personally I would try printing it and verifying that it is indeed what I want:

    for x in grouped:
        print(x)
    

    This will show you something like (item0, item1, item2, ..., itemN)

    Now you can either select which item is the string you are looking for like:

    for x in grouped:
        df.to_excel('test files/ranking_{}.xlsx'.format(x[0]), index=False)
    

    Where the x[0] will tell it to get item0 (ideally a string, but you can also cast it to one as shown below). Alternatively you can create a filename like test files/ranking_(item0, item1, item2, ..., itemN).xlsx (using the string representation of the tuple) with something such as:

    for x in grouped:
        df.to_excel('test files/ranking_{}.xlsx'.format(x), index=False)
    

    It should be noted that the .format() method is implicitly calling the str() function when it is needed though (since .format() is different than concatenation and is usually a lot cleaner to read, faster to run, and just generally better than concatenation). If you realllllllly want to use concatenation you would just have to make it a string yourself:

    for x in grouped:
        df.to_excel('test files/ranking_' + str(x) + '.xlsx', index=False)
    

    But please, transition to .format()!! If you have a newer version of python you can even skip to the latest and greatest f strings

    It also seems you should review pandas groupby function

    Their example leads me to believe you need:

    for name, group in grouped:
        group.to_excel('test files/ranking_{}.xlsx'.format(name), index=False)
    

    However, I am not an pandas expert and have not tested this to be accurate.