Search code examples
pandasconcatenationpandas-loc

Unexpected results using pandas .loc - trying to concatenate 2 columns based on a condition


I have a df where I am trying to merge 2 columns based on a condition.

Create df

df = em_df[['Redcap_Case_num', 'EV_EM',  'COMP_EM', 'EV_RND', 'COMP_EM_RND'] ].head(3)
df.to_clipboard(excel = False, sep = ', ')

#Add EMFREAS - RND columns to df
cols_to_fill=[x for x in ln.columns if x.startswith("EMFREAS")]

for column in cols_to_fill:
    df[column] = ln[column].copy()
df.to_clipboard(excel = False, sep = ', ')

Output - Please understand I have tried to follow the instructions to format the table, but the instructions did not work for me. I'm not sure what I am doing wrong so this is how I was able to do it.

ID        EV_EM     COMP_EM          EV_RND      COMP_EM_RND     EMFREAS1   EMFREAS2 
YA007      1        Not Done                   Insufficient                                                                                
YA006      1                                                                                                                                              
YA005      0        Outside grid  EM Not done              

I need to merge the EV_RND column and the COMP_EM_RND columns to populate all the columns that start with EMFREAS (You are only seeing a subset of the columns)

Here is the code I am trying to use to do this:

#apply ND filter to df and merge to ln df

EV_ND = df["EV_EM"]==0 
EM_ND = df['COMP_EM'] == 'Not Done'

df.loc[EV_ND | EM_ND, cols_to_fill]=df["EV_RND"] + '|' + df["COMP_EM_RND"]                                                                                                                                                                                                    

The expected outcome should look like this:

ID       EV_EM COMP_EM          EV_RND      COMP_EM_RND     EMFREAS1            EMFREAS2 
YA007    1        Not Done      EV ND        Insufficient   Insufficient|EV ND Insufficient|EV ND                                                                          
YA006    1                                                                                                                                              
YA005    1        Outside grid  EM Not done                EM Not done           EM Not done      

Solution

  • If the empty values in your df are actually empty strings, you can create a separator series equal to | if EV_RND and COMP_EM_RND are not empty, empty string otherwise. Then concat EV_RND, the separator series and COMP_EM_RND:

    sep_series = df.apply(lambda x: '|'
                            if (x['EV_RND'] and x['COMP_EM_RND'])
                            else '', axis=1)
    
    fill_series = df['EV_RND'].str.cat(sep_series).str.cat(df['COMP_EM_RND'])
    
    for col in df.columns:
        if col.startswith('EMFREAS'):
            df[col] = df[col].replace('', np.nan).fillna(fill_series)
    

    Output:

          ID  EV_EM       COMP_EM       EV_RND   COMP_EM_RND            EMFREAS1            EMFREAS2
    0  YA007      1      Not Done        EV ND  Insufficient  EV ND|Insufficient  EV ND|Insufficient
    1  YA006      1                                                                                 
    2  YA005      0  Outside grid  EM Not done                       EM Not done         EM Not done