Search code examples
pythonpandasdataframeconcatenation

Merge the all elements of multiple columns into one column in series while keeping NaNs


Context: I have 5 years of weight data. The first column is the date (month and day), the succeeding columns are the years with corresponding weight for each day of the month. I want to have a full plot of all of my data among other things and so I want to combine all into just two columns. First column is the dates from 2018 to 2022, then the second column is the corresponding weight to each date. I have managed the date part, but can't combine the weight data. In essence, I want to turn ...

   0    1
0  1  4.0
1  2  NaN
2  3  6.0

Into ...

   0
0  1
1  2
2  3
3  4
4  NaN
5  6.0

pd.concat only puts the year columns next to each other. .join, .merge, melt, stack. agg don't work either. How do I do this?

sample code:

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'2018': [1, 2, 3]})
df2 = pd.DataFrame({'2019': [4, np.NaN, 6]})

merged_df = pd.concat([df1,df2],axis=1, ignore_index=True, levels = 0)
print(merged_df)

P.S. I particularly don't want to input any index names (like id_vars="2018") because I want this process to be automated as the years go by with more data.

concat, merge, melt, join, stack, agg. i want to combine all column data into just one series


Solution

  • I think np.ravel(merged_df,order='F') will do the job for you.

    If you want it in the form of a dataframe then pd.DataFrame(np.ravel(merged_df,order='F')).