I have a list and a data frame.
import pandas as pd
work_station = ['A','B','C']
name = ['Mike','Tom','Scott','Tracy']
salary = ['60000','50000','100000','90000']
df = pd.DataFrame({'name':name,'salary':salary})
I want to cross join work_station and df together so the output looks like following:
station Name salary
A Mike 60000
A Tom 50000
A Scott 100000
A Tracy 90000
B Mike 60000
B Tom 50000
B Scott 100000
B Tracy 90000
C Mike 60000
C Tom 50000
C Scott 100000
C Tracy 90000
I tried to use * function
df1 = work_station * salary
but it's not working becase
TypeError: can't multiply sequence by non-int of type 'list'
Any suggestion? Thanks!
Easy peasy, use concat
with a keys
argument:
(pd.concat([df] * len(work_station), keys=work_station)
.reset_index(level=1, drop=True)
.rename_axis('station')
.reset_index()
)
station name salary
0 A Mike 60000
1 A Tom 50000
2 A Scott 100000
3 A Tracy 90000
4 B Mike 60000
5 B Tom 50000
6 B Scott 100000
7 B Tracy 90000
8 C Mike 60000
9 C Tom 50000
10 C Scott 100000
11 C Tracy 90000
You can also go the merge
route with a cartesian product:
(pd.DataFrame(work_station, columns=['station'])
.assign(foo=1)
.merge(df.assign(foo=1))
.drop('foo', 1)
)
station name salary
0 A Mike 60000
1 A Tom 50000
2 A Scott 100000
3 A Tracy 90000
4 B Mike 60000
5 B Tom 50000
6 B Scott 100000
7 B Tracy 90000
8 C Mike 60000
9 C Tom 50000
10 C Scott 100000
11 C Tracy 90000