Search code examples
pythonpandasstringlist

Join lists of strings in a dataframe column while ignoring null values


I have a DataFrame column consisting of lists of strings and one NaN value. I am trying to join the lists of strings while ignoring the NaN with df.loc, Series.notnull(), and Series.apply(). I expect this to join each of the lists while skipping over the NaNs, but I'm receiving "TypeError: can only join an iterable."

I'm setting up my DataFrame like this:

import pandas as pd

data = {'id': [['54930058LIMFSJIOLQ48'],np.nan,['5493006B6WMKNQ8QNP51 254900425JAG3QVRMM28']]}
df = pd.DataFrame(data)
    id
0   [54930058LIMFSJIOLQ48]
1   NaN
2   [5493006B6WMKNQ8QNP51 254900425JAG3QVRMM28]

This is the line I'm using to join the strings. Why is it throwing an error?

df.loc[df['id'].notnull(), 'id'] = df['id'].apply(lambda x: ', '.join(x))

Solution

  • The “np.nan” code you entered is the cause of the problem because NaN is not an iterable type. Therefore python issues an error message when you try to run ', '.join(x) on NaN

    If you still want to run the code, you can use isinstance to ensure x is a list and you can run the code.

    Try this one

    df['id'] = df['id'].apply(lambda x: ', '.join(x) if isinstance(x, list) else x)
    

    The code will check x whether it is a list or not. If x is not a list, it will return x itself.