I have two dataframes
df1
Area-ID-Series | Area-ID | MODE |
---|---|---|
BSD-1-1800 | BSD-1 | VM |
BSD-2-1800 | BSD-2 | VM |
BSD-3-1800 | BSD-3 | VM |
BSE-1-2100 | BSE-1 | XM |
BSE-1-900 | BSE-1 | VM |
BSE-2-2100 | BSE-2 | XM |
BSE-2-900 | BSE-2 | VM |
BSE-3-2100 | BSE-3 | XM |
BSE-3-900 | BSE-3 | VM |
SMR-1-1800 | SMR-1 | VM |
df2
Area-ID | 900 | 1800 | 2100 |
---|---|---|---|
BSD-1 | |||
BSD-2 | |||
BSD-3 | |||
BSE-1 | |||
BSE-2 | |||
BSE-3 | |||
SMR-1 |
df2 has a key based on df1 but without the "Series", because the "Series" transformed into columns.
I want to get MODE value for each series based on Area-ID, so my desired output is like this
df2
Area-ID | 900 | 1800 | 2100 |
---|---|---|---|
BSD-1 | - | VM | - |
BSD-2 | - | VM | - |
BSD-3 | - | VM | - |
BSE-1 | VM | - | XM |
BSE-2 | VM | - | XM |
BSE-3 | VM | - | XM |
SMR-1 | - | VM | - |
I have tried this code but still no idea how it should be done
df_result = pd.merge(df2,
df1[['Area-ID', 'MODE']],
on= 'Area-ID',
how='left')
I have tried to do this in excel with df1 as "city" the formula look like this
900 series column
=IFNA(VLOOKUP(A2&"-"&$B$1,city!A:C,3,0),"-")
1800 series column
=IFNA(VLOOKUP(A2&"-"&$C$1,city!A:C,3,0),"-")
2100 series column
=IFNA(VLOOKUP(A2&"-"&$D$1,city!A:C,3,0),"-")
it worked, but it takes a lot of time because my data is huge so I try to do it with python instead
I don't know merge very well. Perhaps there is a way to do based on it. Did it differently.
A 'number' column is created. To do this, the rows of the Area-ID-Series column of dataframe df1 are split() into an array by delimiter using split and expand=True is applied to get the columns. The second column is retrieved.
Further, the dataframe is grouped by df1.index, that is, it is called on each row, where the expression df2['Area-ID'] == x['Area-ID'].values[0]
is a mask for indexing rows, and x[' number'].values[0]
acts as a column name for df2
filtering. Explicit loc indexing is used, where the row indexes are on the left, the column name is on the right.
.values[0]
is used to extract the value from the list.
import pandas as pd
df1 = pd.read_csv('df1.csv', header=0)
df2 = pd.read_csv('df2.csv', header=0)
df1['number'] = df1['Area-ID-Series'].str.split('-', expand=True)[2].astype(str)
def my_func(x):
df2.loc[df2['Area-ID'] == x['Area-ID'].values[0], x['number'].values[0]] = x['MODE'].values[0]
df1.groupby(df1.index).apply(my_func)
print(df2)
Output
Area-ID 900 1800 2100
0 BSD-1 NaN VM NaN
1 BSD-2 NaN VM NaN
2 BSD-3 NaN VM NaN
3 BSE-1 VM NaN XM
4 BSE-2 VM NaN XM
5 BSE-3 VM NaN XM
6 SMR-1 NaN VM NaN