Search code examples
pythonpandasfillna

python - fillna on conditional of multiple columns from 2 dataframes


I have 2 dataframes and I want to fill the na in df1 below, with the data that's in df2 but were the team column matches either the h or v column in df2:

So in other words fillna for df1['temp', 'wspd', 'cond'] with those vales in df2 where df1['team'] == df2['h'] or df2['v']

df1:

team        day    temp      ou      wspd     cond
CIN         SUN     nan     42.0     nan      nan
 KC         MN      nan     43.5     nan      nan
IND         SUN     nan     42.0     nan      nan
CIN         SUN     nan     42.0     nan      nan
SEA         SUN     nan     45.5     nan      nan
PIT         SN      nan     45.0     nan      nan
MIA         THU     nan     37.5     nan      nan
DEN         MN      nan     43.5     nan      nan
NYJ         SUN     nan     46.5     nan      nan
CAR         SUN     nan     44.5     nan      nan
LAC         SUN     nan     48.0     nan      nan
MIN         SUN     nan     45.0     nan      nan
NO          SUN     nan     48.5     nan      nan
NYJ         SUN     nan     46.5     nan      nan
CLE         SUN     nan     37.5     nan      nan
OAK         SUN     nan     45.5     nan      nan
CAR         SUN     nan     44.5     nan      nan
OAK         SUN     nan     45.5     nan      nan
...         ...     ...      ...    ...       ...

df2:

v     h     temp    humd     wspd    cond
MIA  BAL    60.0    41.0    12.0    Mostly Sunny
MIN  CLE    47.0    58.0    14.0    Cloudy
CHI  NO     70.0    0.0      0.0    Dome
ATL  NYJ    66.0    83.0    17.0    Rain
IND  CIN    44.0    58.0    12.0    Cloudy
SF   PHI    65.0    81.0    11.0    Rain
OAK  BUF    49.0    65.0    9.0     Cloudy
LAC  NE     64.0    75.0    13.0    Showers
CAR  TB     67.0    59.0    17.0    Partly Cloudy
HOU  SEA    58.0    73.0    4.0     Cloudy
DAL  WAS    58.0    66.0    6.0     Rain
PIT  DET    70.0    0.0      0.0    Dome
DEN  KC     55.0    40.0    12.0    Partly Cloudy

Solution

  • I would approach this by first reshaping df2 so that columns v and h are in one column, and then setting that column as the index:

    df2 = df2.melt(value_vars=['v','h'], \
      id_vars=['temp','humd','wspd','cond'], value_name='team').set_index('team')
    #       temp  humd  wspd           cond variable
    # team                                          
    # MIA   60.0  41.0  12.0   Mostly Sunny        v
    # MIN   47.0  58.0  14.0         Cloudy        v
    # CHI   70.0   0.0   0.0           Dome        v
    # ATL   66.0  83.0  17.0           Rain        v
    # IND   44.0  58.0  12.0         Cloudy        v
    

    From there, you can set team as the index in df and use df.update:

    df1.set_index('team', inplace=True)
    df1.update(df2[['temp', 'wspd', 'cond']])
    #       day  temp    ou  wspd           cond
    # team                                      
    # CIN   SUN  44.0  42.0  12.0         Cloudy
    # KC     MN  55.0  43.5  12.0  Partly Cloudy
    # IND   SUN  44.0  42.0  12.0         Cloudy
    # CIN   SUN  44.0  42.0  12.0         Cloudy
    # SEA   SUN  58.0  45.5   4.0         Cloudy