I have the following DataFrame with the columns low_scarcity and high_scarcity (a value is either on high or low scarcity):
id | low_scarcity | high_scarcity |
---|---|---|
0 | When I was five.. | |
1 | I worked a lot... | |
2 | I went to parties... | |
3 | 1 week ago | |
4 | 2 months ago | |
5 | another story.. |
I want to create another column 'target' that when there's an entry in low_scarcity column, the value will be 0, and when there's an entry in high_scarcity column, the value will be 1. Just like this:
id | low_scarcity | high_scarcity | target |
---|---|---|---|
0 | When I was five.. | 0 | |
1 | I worked a lot... | 1 | |
2 | I went to parties... | 1 | |
3 | 1 week ago | 0 | |
4 | 2 months ago | 0 | |
5 | another story.. | 1 |
I tried first replacing the entries with no value with 0 and then create a boolean condition, however I can't use .replace('',0)
because the columns that are empty don't appear as empty values.
Supposing your dataframe is called df
and that a value is either on on high or low scarcity, the following line of code does it
import numpy as np
df['target'] = 1*np.array(df['high_scarcity']!="")
in which the 1*
performs an integer conversion of the boolean values.
If that is not the case, then a more complex approach should be taken
res = np.array(["" for i in range(df.shape[0])])
res[df['high_scarcity']!=""] = 1
res[df['low_scarcity']!=""] = 0
df['target'] = res