I have two data frames F1 and F2 containing both the column id1, id2.
F1 contains two columns F1[id1,id2]
.
F2 contains three column [id1,id2,Description]
I wante to test if F2['id1']
exists in F1['id1']
OR F2['id2']exists in
F1['id2'] then i must addd a colmun in F1 with Description of this id1 or id2 in F2` .
The contens of F1 and F2 are are HERE. The Output That im attending on F1 is also HERE
I created F1 and F2 like This
F1 = {'id1': ['x22', 'x13','NaN','x421'],'id2':['NaN','223','788','NaN']}
F1 = pd.DataFrame(data=F1)
F2 = {'id1': ['x22', 'NaN','NaN','x413','x421'],'id2':['NaN','223','788','NaN','233'],'Description':['California','LA','NY','Havnover','Munich']}
F2 = pd.DataFrame(data=F2)
Actually i tried several solutions . But nothing helps me to do it . Help please
Use:
#if necessary replace string NaN to missing values
F1 = F1.replace('NaN', np.nan)
F2 = F2.replace('NaN', np.nan)
For each column remove duplicates by DataFrame.drop_duplicates
, missing valeus by DataFrame.dropna
and create index by id
column by DataFrame.set_index
:
s1 = F2.drop_duplicates('id1').dropna(subset=['id1']).set_index('id1')['Description']
s2 = F2.drop_duplicates('id2').dropna(subset=['id2']).set_index('id2')['Description']
Then use Series.map
by columns from F1
with Series.combine_first
:
F1['Description'] = F1['id1'].map(s1).combine_first(F1['id2'].map(s2))
print (F1)
id1 id2 Description
0 x22 NaN California
1 x13 223 LA
2 NaN 788 NY
3 x421 NaN Munich