Search code examples
pythonexcelpandasdataframeconcatenation

How can I simplify my pandas script using a loop?


I have the following code:

import pandas as pd

df22=pd.read_excel(r"C:\Users\H\Desktop\Files\Table22.xlsx")

#Select the sheets that are to be transformed
df3=pd.read_excel(r"C:\Users\H\Desktop\Files\Table3.xlsx")
df4=pd.read_excel(r"C:\Users\H\Desktop\Files\Table4.xlsx")
df5=pd.read_excel(r"C:\Users\H\Desktop\Files\Table5.xlsx")
df6=pd.read_excel(r"C:\Users\H\Desktop\Files\Table6.xlsx")
df7=pd.read_excel(r"C:\Users\H\Desktop\Files\Table7.xlsx")
df8=pd.read_excel(r"C:\Users\H\Desktop\Files\Table8.xlsx")
df9=pd.read_excel(r"C:\Users\H\Desktop\Files\Table9.xlsx")
df10=pd.read_excel(r"C:\Users\H\Desktop\Files\Table10.xlsx")
df11=pd.read_excel(r"C:\Users\H\Desktop\Files\Table11.xlsx")
df12=pd.read_excel(r"C:\Users\H\Desktop\Files\Table12.xlsx")
df13=pd.read_excel(r"C:\Users\H\Desktop\Files\Table13.xlsx")
df14=pd.read_excel(r"C:\Users\H\Desktop\Files\Table14.xlsx")
df15=pd.read_excel(r"C:\Users\H\Desktop\Files\Table15.xlsx")
df16=pd.read_excel(r"C:\Users\H\Desktop\Files\Table16.xlsx")
df17=pd.read_excel(r"C:\Users\H\Desktop\Files\Table17.xlsx")
df18=pd.read_excel(r"C:\Users\H\Desktop\Files\Table18.xlsx")
df19=pd.read_excel(r"C:\Users\H\Desktop\Files\Table19.xlsx")
df20=pd.read_excel(r"C:\Users\H\Desktop\Files\Table20.xlsx")
df21=pd.read_excel(r"C:\Users\H\Desktop\Files\Table21.xlsx")

df=pd.concat([df22,df3,df4,df5,df6,df7,df8,df9,df10,df11,df12,df13,df14,df15,df16,df17,df18,df19,df20,df21], join='inner')

df.to_excel(r'C:\Users\H\Desktop\Files\Allweeks.xlsx', sheet_name='sheet1', index = False) 

It appends Week22.xlsx with all weeks between 3 to 21. I'm trying to find out if anyone knows how this script can be improved. I was trying to use loops but I just couldn't get it to work.


Solution

  • Use list comprehension:

    df22=pd.read_excel(r"C:\Users\H\Desktop\Files\Table22.xlsx")
    dfs = [pd.read_excel(rf"C:\Users\H\Desktop\Files\Table{x}.xlsx") for x in range(3, 22)]
    df=pd.concat([df22] + dfs, join='inner')
    
    df.to_excel(r'C:\Users\H\Desktop\Files\Allweeks.xlsx', sheet_name='sheet1', index = False)
    

    Or create list of all DataFrames and then append last dataframe to list like first:

    dfs = [pd.read_excel(rf"C:\Users\H\Desktop\Files\Table{x}.xlsx") for x in range(3, 23)]
    df=pd.concat(dfs[-1:] + dfs[:-1], join='inner')
    #another idea is swap order - 22, 21, 20 ... 3
    #df=pd.concat(dfs[::-1], join='inner')
    
    df.to_excel(r'C:\Users\H\Desktop\Files\Allweeks.xlsx', sheet_name='sheet1', index = False)