I have this big data base that I need to separate into diferent segments (diferent "valid until dates") and then get each o the segments into a excel workbook. Each of the segments also should have its own excelsheet.
Df Exemple:
A valid until C D
Date
01/08/2023 Tesouro Educa+ 15/12/2030 5,06 5,18
02/08/2023 Tesouro Educa+ 15/12/2030 5,05 5,17
03/08/2023 Tesouro Educa+ 15/12/2030 5,02 5,14
04/08/2023 Tesouro IPCA+ 15/12/2030 5,02 5,14
07/08/2023 Tesouro IPCA+ 15/12/2030 4,98 5,10
...
01/08/2023 Tesouro Selic 15/12/2031 5,05 5,17
02/08/2023 Tesouro Selic 15/12/2031 5,03 5,15
03/08/2023 Tesouro Educa+ 15/12/2031 5,00 5,12
04/08/2023 Tesouro Educa+ 15/12/2031 5,02 5,14
07/08/2023 Tesouro Educa+ 15/12/2031 4,99 5,11
What i tried:
df = pd.read_csv(r'PATH', sep=';')
df.sort_values(by=['valid until','Date'], key=lambda x: np.argsort(natsorted(dados['Date'])))
valid_until_list=dados['valid until'].unique()
for i in valid_until_list:
df2=df[(df['Date']==f'{i}')]
df2=df2.set_index(['Data Base'])
with pd.ExcelWriter(r'NEW_PATH',engine='openpyxl', if_sheet_exists='replace', mode='a') as writer:
i=i.replace('/','_')
# Here i tried changing the name of the worksheets:
sheet_title=''
if df2['A']=='Tesouro IPCA+ com Juros Semestrais':
sheet_title='IPCA+ Juros'
elif nova_df2['A']=='Tesouro IPCA+':
sheet_title='IPCA+'
else:
sheet_title=''
df2.to_excel(writer, sheet_name=f'{sheet_title}_{i}')`
The code did work until i tried to change de worksheet title, now is showing this: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I'm here to ask 2 things. My df is kinda big (145712 rows), so de code is taking way to long to run. Is there a faster way to do this, maybe without using for and if loops? The second would be: how do i fix this title Error? It is not necessary, but would make my life soooo much easier :))
You can use groupby
to form the DFs and write them directly to sheets; the code below defines groups according to 'valid until' and then sorts each group (ie sub-DF) according to the Date
column.
groupings = df.groupby('valid until')
count = 1
with pd.ExcelWriter("test.xlsx", mode = 'w') as writer:
for g in groupings:
sheetname = 'sheet'+str(count)
g[1].sort_values(by = 'Date').to_excel(writer, sheet_name=sheetname, index=False)
count += 1
This gives sheet1, sheet2 etc but you can change to sheet name formation as required. The 'valid until' of each DF is g[0]
in the above code if that is useful.
If it is required to form the groups (and so sub-DFs) using both 'valid until' and 'Date' then the grouping statement would be changed to groupings = df.groupby(['valid until', 'Date'])
and the sort expression then omitted from the statement in the for loop.
The same general principle could be use to form sub-DFs using any combination of the columns.