I have two columns in a pandas DataFrame that look like this:
ColA | ColB |
---|---|
a | a b c |
b | a c |
c | NaN |
NaN | d e f |
NaN | NaN |
I would like to concatenate ColA and ColB such that
How would I code this in Python such that the desired output would look like below:
ColA | ColB | ColC |
---|---|---|
a | a b c | a b c |
b | a c | b a c |
c | NaN | c |
NaN | d e f | d e f |
NaN | NaN | NaN |
Note that 'a' represents a word, and 'a b c' represent three words in a text string
UPDATED answer (using pandas DataFrames): Ok, lets assume you did:
import numpy as np
import pandas as pd
and your DataFrame is as follows:
df
ColA ColB
0 a a b c
1 b a c
2 c NaN
3 NaN d e f
4 NaN NaN
Then you define your combination function:
def concat(row):
a = row["ColA"]
b = row["ColB"]
if not pd.isnull(a) and pd.isnull(b):
return a
if pd.isnull(a) and not pd.isnull(b):
return b
if pd.isnull(a) and pd.isnull(b):
return np.nan
if a in b:
return b
else:
return a + b
and apply it to your DataFrame (to each row):
df.apply(concat, axis="columns")
what gives this result:
0 a b c
1 ba c
2 c
3 d e f
4 NaN
dtype: object
Of course, you can still think about whether you want to do a simple concatenation with a +
in concat
, or whether you want to add a space, etc.
You get your final result by:
df["ColC"] = df.apply(concat, axis="columns")
Over and out.