Search code examples
pythonconcatenationnanmultiple-conditions

Concatenate two columns with nan based on condition (python)


I have two columns in a pandas DataFrame that look like this:

ColA ColB
a a b c
b a c
c NaN
NaN d e f
NaN NaN

I would like to concatenate ColA and ColB such that

  • if ColA is in ColB then retain ColB
  • if ColA is not in ColB concatenate them
  • if either are NaN, then retain the column with values
  • if both are NaN, then retain NaN

How would I code this in Python such that the desired output would look like below:

ColA ColB ColC
a a b c a b c
b a c b a c
c NaN c
NaN d e f d e f
NaN NaN NaN

Note that 'a' represents a word, and 'a b c' represent three words in a text string


Solution

  • UPDATED answer (using pandas DataFrames): Ok, lets assume you did:

    import numpy as np
    import pandas as pd
    

    and your DataFrame is as follows:

    df
      ColA   ColB
    0    a  a b c
    1    b    a c
    2    c    NaN
    3  NaN  d e f
    4  NaN    NaN
    

    Then you define your combination function:

    def concat(row):
        a = row["ColA"]
        b = row["ColB"]
        if not pd.isnull(a) and pd.isnull(b):
            return a
        if pd.isnull(a) and not pd.isnull(b):
            return b
        if pd.isnull(a) and pd.isnull(b):
            return np.nan
        if a in b:
            return b
        else:
            return a + b
    

    and apply it to your DataFrame (to each row):

    df.apply(concat, axis="columns")
    

    what gives this result:

    0    a b c
    1     ba c
    2        c
    3    d e f
    4      NaN
    dtype: object
    

    Of course, you can still think about whether you want to do a simple concatenation with a + in concat, or whether you want to add a space, etc. You get your final result by:

    df["ColC"] = df.apply(concat, axis="columns")
    

    Over and out.