I have a dataframe with 4 columns: Name, Range, Amount, Limit. Name column has 3 names and other columns have 6 values. Now I want to cross join each name to other 6 columns in such a way that the final dataframe must have 18 rows of data, where each name has 6 rows of data.
This is the dataframe:
Name Range Amount Limit
John 1.1-3 200000 100000
Adam 2.2-3.7 324000 129000
Prince 4.1-5.5 450000 225000
6.7-8 574667 276333
8.1-9 699667 338833
9.1-10 824667 401333
Final result must be
Name Range Amount Limit
John 1.1-3 200000 100000
John 2.2-3.7 324000 129000
John 4.1-5.5 450000 225000
John 6.7-8 574667 276333
John 8.1-9 699667 338833
John 9.1-10 824667 401333
Adam 1.1-3 200000 100000
Adam 2.2-3.7 324000 129000
Adam 4.1-5.5 450000 225000
Adam 6.7-8 574667 276333
Adam 8.1-9 699667 338833
Adam 9.1-10 824667 401333
Princy 1.1-3 200000 100000
Princy 2.2-3.7 324000 129000
Princy 4.1-5.5 450000 225000
Princy 6.7-8 574667 276333
Princy 8.1-9 699667 338833
Princy 9.1-10 824667 401333
This is what I tried
df1 = data['Name']
df2 = data[['Range','Amount','Limit']]
df1['key'] = 1
df2['key'] = 1
result = pd.merge(df1, df2, on ='key').drop("key", 1)
You were almost there, you first need to get rid of the empty values in "Name". You can also use how='cross'
in the merge:
out = (data.loc[data['Name'].ne(''), ['Name']]
.merge(data.drop(columns='Name'), how='cross')
)
I assumed that the empty cells are empty strings (''
), if you have NaNs, replace .ne('')
by .notna()
.
Output:
Name Range Amount Limit
0 John 1.1-3 200000 100000
1 John 2.2-3.7 324000 129000
2 John 4.1-5.5 450000 225000
3 John 6.7-8 574667 276333
4 John 8.1-9 699667 338833
5 John 9.1-10 824667 401333
6 Adam 1.1-3 200000 100000
7 Adam 2.2-3.7 324000 129000
8 Adam 4.1-5.5 450000 225000
9 Adam 6.7-8 574667 276333
10 Adam 8.1-9 699667 338833
11 Adam 9.1-10 824667 401333
12 Prince 1.1-3 200000 100000
13 Prince 2.2-3.7 324000 129000
14 Prince 4.1-5.5 450000 225000
15 Prince 6.7-8 574667 276333
16 Prince 8.1-9 699667 338833
17 Prince 9.1-10 824667 401333
Reproducible input:
data = pd.DataFrame({'Name': ['John', 'Adam', 'Prince', '', '', ''],
'Range': ['1.1-3', '2.2-3.7', '4.1-5.5', '6.7-8', '8.1-9', '9.1-10'],
'Amount': [200000, 324000, 450000, 574667, 699667, 824667],
'Limit': [100000, 129000, 225000, 276333, 338833, 401333]})