Search code examples
pythonpandasdataframepandas-melt

Add label-column to DataFrame


I have two DataFrames for example

df1:

0    1    2    3
a    1    2    3    4
b   10   20   30   40
c  100  200  300  400
------------------

df2:

   0
0  x
1  y
2  z

Now I want to combine both like:

df_new:

   value label
0      1     x
1      2     x
2      3     x
3      4     x
0     10     y
1     20     y
2     30     y
3     40     y
0    100     z
1    200     z
2    300     z
3    400     z

I wrote a really awkward code like:

df_new=pd.DataFrame()

for i,j in zip(df1.index, df2.index):
    x=df1.loc[i]
    y=df2.loc[j]
    

    label=np.full(x.shape[0],y)
    
    df=pd.DataFrame({'value':x,'label':label})
    df_new=pd.concat([df_new,df],axis=0)
    
print(df_new)

But I can imagine that there is a pandas-function like pd.melt or something which can do that better for bigger scale.


Solution

  • If there is same length of both DataFrames is possible create index in df1 by column 0 in df2 and reshape by DataFrame.stack, last encessary some data processing:

    df = (df1.set_index(df2[0])
             .stack()
             .reset_index(level=1, drop=True)
             .rename_axis('lab')
             .reset_index(name='val')[['val','lab']])
    
    
    print (df)
        val lab
    0     1   x
    1     2   x
    2     3   x
    3     4   x
    4    10   y
    5    20   y
    6    30   y
    7    40   y
    8   100   z
    9   200   z
    10  300   z
    11  400   z
    

    Solution with DataFrame.melt and append second df to first by DataFrame.join:

    df = (df1.reset_index(drop=True)
             .join(df2.add_prefix('label'))
             .melt(['label0', 'label1'], ignore_index=False)
             .sort_index(ignore_index=True)
             .drop('variable', axis=1)[['value','label0','label1']]
             )
    print (df)
        value label0 label1
    0       1      x     xx
    1       2      x     xx
    2       3      x     xx
    3       4      x     xx
    4      10      y     yy
    5      20      y     yy
    6      30      y     yy
    7      40      y     yy
    8     100      z     zz
    9     200      z     zz
    10    300      z     zz
    11    400      z     zz