I'm trying to join two string in a new column within a DataFrame, but have tried several approaches and none work, the closest I have come is with a lambda formula, but still can't understand the problem. Can anyone help?
Data frame:
full_name certificacion company
member_#
296057 John Doe A CO3
155977 Jane Doe NaN CO2
228610 Johnny Doe A, C CO3
and I need a new column that has 'full_name' and 'certification' as:
member_name
John Doe, A
Jane Doe
Johnny Doe, A, C
I am trying:
df['member_name'] = df[df.columns[['full_name', 'certificacion']]].apply(
lambda x: ', '.join() if x['certificacion'].notnull() else x['full_name'], axis=1)
But get an error:
IndexError: only integers, slices (
:
), ellipsis (...
), numpy.newaxis (None
) and integer or boolean arrays are valid indices
I am new at Pandas and have tried several variations of the above lambdas function but I can't seem to find the right syntax.
Thanks
Since both columns are strings, you can add them together with a +
with ,
between. Then for the rows that have NaN
just handle those with np.where()
logic to use full_name
instead of member_name
:
df['member_name'] = df['full_name'] + ', ' + df['certificacion']
df['member_name'] = np.where((df['member_name'].isna()), df['full_name'], df['member_name'])
output:
member_# full_name certificacion company member_name
0 296057 John Doe A CO3 John Doe, A
1 155977 Jane Doe NaN CO2 Jane Doe
2 228610 Johnny Doe A, C CO3 Johnny Doe, A, C