Search code examples
pythonpandaspandas-groupbyopenpyxlglob

Save the existing files without deleting existing data by splitting the DataFrame by column


I have three excel files- A1/A2/A3 with some existing data in 'Sheet1'.

import glob
from openpyxl import load_workbook
from openpyxl.utils.dataframe import dataframe_to_rows
import pandas as pd

df=pd.DataFrame()
for f in glob.glob(r'...\Excel\A*.xlsx'):
    info=pd.read_excel(f)
    df=df.append(info)   

With the above code I have got the below DataFrame:

Sample     Description
----------------------
A1         Auto
A2         Manual
A3         Fully-Automated

I want data of A1 to get pasted in A1 file, A2 data to get pasted in A2 file and A3 data to get pasted in A3 file in 'Sheet2' without deleting the existing 'Sheet1'.

Sample     Description     
----------------------       ````` this data should go in A1 File
A1         Auto

Sample     Description     
----------------------       ````` this data should go in A2 File
A2         Manual

Sample     Description     
----------------------       ````` this data should go in A3 File
A3         Fully-Automated

I tried writing below code but only last row is getting pasted in all three excel files.


 - File A1

*Sheet2*
Sample     Description     
----------------------       
    A3         Fully-Automated

 - File A2

*Sheet2*
Sample     Description     
----------------------       
A3         Fully-Automated

 - File A3

*Sheet2*
Sample     Description     
----------------------       
A3         Fully-Automated



for filename in glob.glob(r'...\Excel\A*.xlsx'):
    for name,data in group_df:
        book=load_workbook(filename)
        writer=pd.ExcelWriter(filename,engine='openpyxl')
        writer.book=book
        data.to_excel(writer,sheet_name='Sheet2')
    writer.save()
    writer.close()

I need to groupby the DataFrame by column 'Sample' and paste the splitted data back to respective files with New sheet as 'Sheet2' without deleting the existing 'Sheet1'.


Solution

  • It's hard to be sure without a minimum reproducible example but this should lead you to the solution;

    import glob
    from openpyxl import load_workbook
    from openpyxl.utils.dataframe import dataframe_to_rows
    import pandas as pd
    
    # this makes things a bit easier 
    folder_path = r'...\Excel\\'
    
    # that's ok 
    df=pd.DataFrame()
    for f in glob.glob(folder_path + 'A*.xlsx'):
        info=pd.read_excel(f)
        df=df.append(info)   
    
    # iterate over the groupby
    # name is the label in the column 'Sample'
    # group is a dataframe 
    for name, group in df.groupby('Sample'):
    
        # file path for the name of the group (A1, A2, ...)
        filename = folder_path + name + '.xls'
    
        # do your thing
        book=load_workbook(filename)
        writer=pd.ExcelWriter(filename,engine='openpyxl')
        writer.book=book
    
        # save the group to excel 
        group.to_excel(writer,sheet_name='Sheet2')
    
        writer.save()
        writer.close()