Search code examples
pythonpandasdataframejoincontains

If a Pandas column doesn't contain a value from another column in the same row, then add the value from another column in a new line


I want to add values from column "A" to column "B" in a new line, only if the corresponding value from column "A" is not contained in column "B".

My data frame:

df

  A          B
  value1     value4
  value1
  value1     value4\nvalue1\n
  value2     value4\nvalue7\n
  value2     value4\n
  value3
  value3     value4\nvalue3\nvalue6
  value3     value2\nvalue1
  value3     value3\nvalue1\nvalue6

So, my desired data frame

df

  A          B
  value1     value4\nvalue1
  value1     value1
  value1     value4\nvalue1
  value2     value4\nvalue7\nvalue2
  value2     value4\nvalue2
  value3     value3
  value3     value4\nvalue3\nvalue6
  value3     value2\nvalue1\nvalue3
  value3     value3\nvalue1\nvalue6

Additionally, the problem is that sometimes at the end of the string in column "B", a \n is present and sometimes is not.

I tried with df.A.str.contains(df["A"]), but it doesn't work when contains() has a data frame column. I think that something like this could work: df.B.str.extract('(%s)' % '|'.join(df.A)), and this is only for the values from column "A" which are present in column "B" - for matches, and I don't know how to adapt to my problem.


Solution

  • Create a Boolean mask with a list comprehension. Replace missing values with the empty string and use Series.where:

    mask = [a in b for a, b in zip(df['A'], df['B'].fillna(''))]
    b = df['B'].str.replace(r'\\n$', '')
    df['B'] = b.where(mask, b + '\\n' + df['A']).fillna(df['A'])
    print (df)
            A                       B
    0  value1          value4\nvalue1
    1  value1                  value1
    2  value1          value4\nvalue1
    3  value2  value4\nvalue7\nvalue2
    4  value2          value4\nvalue2
    5  value3                  value3
    6  value3  value4\nvalue3\nvalue6
    7  value3  value2\nvalue1\nvalue3
    8  value3  value3\nvalue1\nvalue6