python pandas csv dataframe string-concatenation

How to merge data from two columns into one with a + sign using pandas

I have 2 columns in this format in a csv.

Drug    Product(Predicted)
#N/A    acetaldehyde
#N/A    nicotine
#N/A    resin
#N/A    rosa
#N/A    eliglustat
#N/A    valpromide
#N/A    clostridium
#N/A    ambenonium
solution    valpromide
solution    valpromide
solution    valpromide
solution    valpromide
solution    valpromide
solution    valpromide
solution    valpromide
solution    valpromide
solution    valpromide
#N/A    amiloride
#N/A    valsartÃ¡n
#N/A    manitol
#N/A    hipromelosa
#N/A    fosfato disÃ³dico
#N/A    fosfato disÃ³dico
#N/A    agalsidase beta
#N/A    enoxaparin sodium
solution    warfarine
#N/A    eliglustat
#N/A    silver
#N/A    silver

The output I am looking for is that, create a new column and whenever there is an entry present in the drug column(ignore all N/A), add that entry to the new column by concatenating the 2 columns with a plus sign and get an output csv

For example:

New column
solution + valpromide
solution + valpromide
solution + warfarine

I am new to pandas so I wanted to know how this can be done using the same

Solution

Use Series.str.cat with Series.str.strip:

df['new'] = df['Drug'].str.cat(df['Product(Predicted)'], na_rep='', sep='+').str.strip('+')

Or replace missing values to empty strings before merge:

df['new'] = (df['Drug'].fillna('') + '+' + df['Product(Predicted)']).str.strip('+')

print (df)
       Drug Product(Predicted)                  new
0       NaN       acetaldehyde         acetaldehyde
1       NaN           nicotine             nicotine
2       NaN              resin                resin
3       NaN               rosa                 rosa
4       NaN         eliglustat           eliglustat
5       NaN         valpromide           valpromide
6       NaN        clostridium          clostridium
7       NaN         ambenonium           ambenonium
8  solution         valpromide  solution+valpromide
9  solution         valpromide  solution+valpromide

If want remove rows with NaNs in drug column:

df1 = df.dropna(subset=['Drug'])
df1['new'] = df1['Drug'] + '+' + df1['Product(Predicted)']
print (df)
       Drug Product(Predicted)                  new
8  solution         valpromide  solution+valpromide
9  solution         valpromide  solution+valpromide