I have opened my dictionary and I would like to write a loop so to obtain as an output only those rows for which the favorite color
column equals to NaN
.
My code so far:
# Importing modules
import openpyxl as op
import pandas as pd
import numpy as np
import xlsxwriter
from openpyxl import Workbook, load_workbook
# Defining the file path
my_file_path = r'C:\Users\machukovich\Desktop\stack.xlsx'
# Loading the file into a dictionary of Dataframes
my_dict = pd.read_excel(my_file_path, sheet_name=None, skiprows=2)
My_dict output:
my_dict = {'Sheet_1': Name Surname Concatenation ID_ Grade_ favourite color
1 Delilah Gonzalez Delilah Gonzalez NaN NaN NaN
2 Christina Rodwell Christina Rodwell 100.0 3.0 black
3 Ziggy Stardust Ziggy Stardust 40.0 7.0 red ,
'Sheet_2': Name Surname Concatenation ID_ Grade_ favourite color \
0 Lucy Diamonds Lucy Diamonds 22.0 9.0 brown
1 Grace Kelly Grace Kelly 50.0 7.0 white
2 Uma Thurman Uma Thurman 105.0 7.0 purple
3 Lola King Lola King NaN NaN NaN ,
'Sheet_3': Name Surname Concatenation ID_ Grade_ favourite color \
0 Eleanor Rigby Eleanor Rigby 104.0 6.0 blue
1 Barbara Ann Barbara Ann 168.0 8.0 pink
2 Polly Cracker Polly Cracker 450.0 7.0 black
3 Little Joe Little Joe NaN NaN NaN }
My desired output:
my_dict = {'Sheet_1': Name Surname Concatenation ID_ Grade_ favourite color
1 Delilah Gonzalez Delilah Gonzalez NaN NaN NaN
'Sheet_2': Name Surname Concatenation ID_ Grade_ favourite color \
3 Lola King Lola King NaN NaN NaN
'Sheet_3': Name Surname Concatenation ID_ Grade_ favourite color \
3 Little Joe Little Joe NaN NaN NaN
And, finally I would like to write the desired output
to a new excel file (in separate sheets).
Please, enlighten me. I am new to python.
I would do it this way :
with pd.ExcelWriter("output.xlsx", engine="xlsxwriter") as writer:
for sn, df in my_dict.items():
(df.loc[df["favourite color"].isnull()] # we use boolean indexing
.to_excel(writer, sheet_name=sn, index=False)) # with startrow, starcol ?
#this is optional
for ws in writer.sheets:
writer.sheets[ws].autofit() # xlsxwriter 3.0.6+
Output (only Sheet_1
):
Update :
If you want to update my_dict
first, you can use this :
for sn, df in my_dict.items():
my_dict[sn] = df.loc[df["favourite color"].isnull()]
Output :
print(my_dict)
{'Sheet_1': Name Surname Concatenation ID_ Grade_ favourite color
0 Delilah Gonzalez Delilah Gonzalez NaN NaN NaN,
'Sheet_2': Name Surname Concatenation ID_ Grade_ favourite color
0 Lola King Lola King NaN NaN NaN,
'Sheet_3': Name Surname Concatenation ID_ Grade_ favourite color
0 Little Joe Little Joe NaN NaN NaN}
Then (if needed) you can loop through each filtered df
to store it in a spreadsheet :
with pd.ExcelWriter("output.xlsx", engine="xlsxwriter") as writer:
for sn, df in my_dict.items():
df.to_excel(writer, sheet_name=sn, index=False)