Search code examples
pythonpandasdataframenotnull

How do I join the values of two columns in a Pandas dataframe only if none are NaN


I'm trying to join two string in a new column within a DataFrame, but have tried several approaches and none work, the closest I have come is with a lambda formula, but still can't understand the problem. Can anyone help?

Data frame:

    full_name   certificacion   company
member_#            
296057  John Doe    A          CO3
155977  Jane Doe    NaN        CO2
228610  Johnny Doe  A, C       CO3

and I need a new column that has 'full_name' and 'certification' as:

member_name
John Doe, A
Jane Doe
Johnny Doe, A, C

I am trying:

df['member_name'] = df[df.columns[['full_name', 'certificacion']]].apply(
    lambda x: ', '.join() if x['certificacion'].notnull() else x['full_name'], axis=1)

But get an error:

IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices

I am new at Pandas and have tried several variations of the above lambdas function but I can't seem to find the right syntax.

Thanks


Solution

  • Since both columns are strings, you can add them together with a + with , between. Then for the rows that have NaN just handle those with np.where() logic to use full_name instead of member_name:

    df['member_name'] = df['full_name'] + ', ' + df['certificacion']
    df['member_name'] = np.where((df['member_name'].isna()), df['full_name'], df['member_name'])
    

    output:

    member_#      full_name certificacion company       member_name
    0    296057    John Doe             A     CO3       John Doe, A
    1    155977    Jane Doe           NaN     CO2          Jane Doe
    2    228610  Johnny Doe          A, C     CO3  Johnny Doe, A, C