Search code examples
pythonpandasdataframeduplicatesconcatenation

Any way to create a new data frame from two data frames where each row of one df has to be repeated?


I have two data frames like below. I want to create a new data frame where df1 column has to be duplicated based on the length of row of df2 and then each value of a row has to be appended as new rows like below how the excepted output is present. I tried to concatenate two data frames and then use to achieve the excepted output but that's returning empty data frame. Is there any best way to get the excepted output?

data frames:-

import pandas as pd

data = [10,20,30,40,50,60]
df1 = pd.DataFrame(data, columns=['Numbers'])
df1
  Numbers
0   10
1   20
2   30
3   40
4   50
5   60
data = {'name':['pro','lop1','lo','p','ll','qi'],'name2':['pro2','lop2','lol','po','llop','qiij'],
        'names':['pro1','lop3','lol1','ppp','lop','qis']}
df2 = pd.DataFrame(data)
df2
    name    name2   names
0   pro     pro2    pro1
1   lop1    lop2    lop3
2   lo      lol     lol1
3   p       po      ppp
4   ll      llop    lop
5   qi      qiij    qis

Code:-

new_df=pd.concat([df1,df2],axis=1)
final = (pd.wide_to_long(new_df,stubnames=['Names'], i=['Numbers'], 
                j='drop').reset_index().drop('drop', 1))
print(final)

Numbers names   name    name2   Names

(Gives empty data frame)

Excepted output:-

Numbers  Name
    10      pro
    10      pro2
    10      pro1
    20      lop1
    20      lop2
    20      lop3
    30      lo
    30      lol
    30      lol1
    40      p
    40      po
    40      ppp
    50      ll
    50      llop
    50      lop
    60      qi
    60      qiij
    60      qis

Solution

  • Use concat with DataFrame.set_index and DataFrame.stack, last remove secone level of Multiindex and create 2 columns DataFrame:

    new_df = (pd.concat([df1,df2],axis=1)
                .set_index('Numbers')
                .stack()
                .droplevel(1)
                .reset_index(name='Name'))
    

    Or use DataFrame.melt, then is necessary sorting by DataFrame.sort_values:

    new_df = (pd.concat([df1,df2],axis=1)
                .melt('Numbers', value_name='Name')
                .drop('variable', axis=1)
                .sort_values('Numbers', ignore_index=True, kind='mergesort'))
    print (new_df)
        Numbers  Name
    0        10   pro
    1        10  pro2
    2        10  pro1
    3        20  lop1
    4        20  lop2
    5        20  lop3
    6        30    lo
    7        30   lol
    8        30  lol1
    9        40     p
    10       40    po
    11       40   ppp
    12       50    ll
    13       50  llop
    14       50   lop
    15       60    qi
    16       60  qiij
    17       60   qis