I have 2 columns in this format in a csv.
Drug Product(Predicted)
#N/A acetaldehyde
#N/A nicotine
#N/A resin
#N/A rosa
#N/A eliglustat
#N/A valpromide
#N/A clostridium
#N/A ambenonium
solution valpromide
solution valpromide
solution valpromide
solution valpromide
solution valpromide
solution valpromide
solution valpromide
solution valpromide
solution valpromide
#N/A amiloride
#N/A valsartán
#N/A manitol
#N/A hipromelosa
#N/A fosfato disódico
#N/A fosfato disódico
#N/A agalsidase beta
#N/A enoxaparin sodium
solution warfarine
#N/A eliglustat
#N/A silver
#N/A silver
The output I am looking for is that, create a new column and whenever there is an entry present in the drug column(ignore all N/A), add that entry to the new column by concatenating the 2 columns with a plus sign and get an output csv
For example:
New column
solution + valpromide
solution + valpromide
solution + warfarine
I am new to pandas so I wanted to know how this can be done using the same
Use Series.str.cat
with Series.str.strip
:
df['new'] = df['Drug'].str.cat(df['Product(Predicted)'], na_rep='', sep='+').str.strip('+')
Or replace missing values to empty strings before merge:
df['new'] = (df['Drug'].fillna('') + '+' + df['Product(Predicted)']).str.strip('+')
print (df)
Drug Product(Predicted) new
0 NaN acetaldehyde acetaldehyde
1 NaN nicotine nicotine
2 NaN resin resin
3 NaN rosa rosa
4 NaN eliglustat eliglustat
5 NaN valpromide valpromide
6 NaN clostridium clostridium
7 NaN ambenonium ambenonium
8 solution valpromide solution+valpromide
9 solution valpromide solution+valpromide
If want remove rows with NaN
s in drug
column:
df1 = df.dropna(subset=['Drug'])
df1['new'] = df1['Drug'] + '+' + df1['Product(Predicted)']
print (df)
Drug Product(Predicted) new
8 solution valpromide solution+valpromide
9 solution valpromide solution+valpromide