Search code examples
pythonpandascross-join

Cross join in python pandas


I have a dataframe with 4 columns: Name, Range, Amount, Limit. Name column has 3 names and other columns have 6 values. Now I want to cross join each name to other 6 columns in such a way that the final dataframe must have 18 rows of data, where each name has 6 rows of data.

This is the dataframe:

Name    Range   Amount  Limit
John    1.1-3   200000  100000
Adam    2.2-3.7 324000  129000
Prince  4.1-5.5 450000  225000
        6.7-8   574667  276333
        8.1-9   699667  338833
        9.1-10  824667  401333

Final result must be

Name    Range   Amount  Limit
John    1.1-3   200000  100000
John    2.2-3.7 324000  129000
John    4.1-5.5 450000  225000
John    6.7-8   574667  276333
John    8.1-9   699667  338833
John    9.1-10  824667  401333
Adam    1.1-3   200000  100000
Adam    2.2-3.7 324000  129000
Adam    4.1-5.5 450000  225000
Adam    6.7-8   574667  276333
Adam    8.1-9   699667  338833
Adam    9.1-10  824667  401333
Princy  1.1-3   200000  100000
Princy  2.2-3.7 324000  129000
Princy  4.1-5.5 450000  225000
Princy  6.7-8   574667  276333
Princy  8.1-9   699667  338833
Princy  9.1-10  824667  401333

This is what I tried

df1 = data['Name']
 
df2 = data[['Range','Amount','Limit']]

df1['key'] = 1
df2['key'] = 1

result = pd.merge(df1, df2, on ='key').drop("key", 1)

Solution

  • You were almost there, you first need to get rid of the empty values in "Name". You can also use how='cross' in the merge:

    out = (data.loc[data['Name'].ne(''), ['Name']]
               .merge(data.drop(columns='Name'), how='cross')
          )
    

    I assumed that the empty cells are empty strings (''), if you have NaNs, replace .ne('') by .notna().

    Output:

          Name    Range  Amount   Limit
    0     John    1.1-3  200000  100000
    1     John  2.2-3.7  324000  129000
    2     John  4.1-5.5  450000  225000
    3     John    6.7-8  574667  276333
    4     John    8.1-9  699667  338833
    5     John   9.1-10  824667  401333
    6     Adam    1.1-3  200000  100000
    7     Adam  2.2-3.7  324000  129000
    8     Adam  4.1-5.5  450000  225000
    9     Adam    6.7-8  574667  276333
    10    Adam    8.1-9  699667  338833
    11    Adam   9.1-10  824667  401333
    12  Prince    1.1-3  200000  100000
    13  Prince  2.2-3.7  324000  129000
    14  Prince  4.1-5.5  450000  225000
    15  Prince    6.7-8  574667  276333
    16  Prince    8.1-9  699667  338833
    17  Prince   9.1-10  824667  401333
    

    Reproducible input:

    data = pd.DataFrame({'Name': ['John', 'Adam', 'Prince', '', '', ''],
                         'Range': ['1.1-3', '2.2-3.7', '4.1-5.5', '6.7-8', '8.1-9', '9.1-10'],
                         'Amount': [200000, 324000, 450000, 574667, 699667, 824667],
                         'Limit': [100000, 129000, 225000, 276333, 338833, 401333]})