Hoping someone knowledgeable with pandas might be able to steer me in the right direction. I have a dictionary that contains a number of time series data frames for its values. The data frames look like this for each month, respectively:
I am trying to get an output like this:
I attempted a concatenation with the following code:
header = next(iter(dict.values())).iloc[0]
for v in dict.values():
out=pd.concat([v[1:]],axis=0,ignore_index=True)
out.columns=header
print(out)
Unfortunately, my end result is not the full concatenation. Rather, I just get something like this:
I'm missing my header, and I seem to be only getting the last values in the dictionary concatenated to the result. Is pd.concat a good method to use? Or should I instead use append or merge?
Any guidance would be very much appreciated here, thanks.
I think you might be overthinking this. pd.concat
is the correct tool but one of the reasons for that is you don't need to iterate through your dict.
#set up fake data
data = {'September': pd.DataFrame({'ReportingDate': ['9/30/2023', '9/30/2023', '9/30/2023'],
'Category': ['A', 'B', 'C'],
'Volume': [1, 2, 3]}),
'October': pd.DataFrame({'ReportingDate': ['10/31/2023', '10/31/2023', '10/31/2023'],
'Category': ['A', 'B', 'C'],
'Volume': [1, 2, 3]})
}
df = pd.concat(data.values(), ignore_index = True)
print(df)
and that gives
ReportingDate Category Volume
0 9/30/2023 A 1
1 9/30/2023 B 2
2 9/30/2023 C 3
3 10/31/2023 A 1
4 10/31/2023 B 2
5 10/31/2023 C 3
If you give pd.concat
a list of dataframes and axis = 0
(which I omitted just because it's the default), it will connect them end-to-end, aligning the columns according to their label. data.values()
is not technically a list (it's a dict_values
object) but it's close enough to behave like one in this context.
Edit: to give some further context to why you're seeing the results are with your first attempt. In your iteration you are rewriting the variable out
each time, so your end result is only what you overwrote it with in the last iteration. In your first line where you define header
, you are extracting the first row of your first dataframe, but pandas knows that your column labels aren't rows. So the reason your end result looks like it doesn't have column labels is that your column labels are the values from the first row of your first dataframe.