Search code examples
pythonpandasdataframetext-processing

How can I create a column target based on two different columns?


I have the following DataFrame with the columns low_scarcity and high_scarcity (a value is either on high or low scarcity):

id low_scarcity high_scarcity
0 When I was five..
1 I worked a lot...
2 I went to parties...
3 1 week ago
4 2 months ago
5 another story..

I want to create another column 'target' that when there's an entry in low_scarcity column, the value will be 0, and when there's an entry in high_scarcity column, the value will be 1. Just like this:

id low_scarcity high_scarcity target
0 When I was five.. 0
1 I worked a lot... 1
2 I went to parties... 1
3 1 week ago 0
4 2 months ago 0
5 another story.. 1

I tried first replacing the entries with no value with 0 and then create a boolean condition, however I can't use .replace('',0) because the columns that are empty don't appear as empty values.


Solution

  • Supposing your dataframe is called df and that a value is either on on high or low scarcity, the following line of code does it

    import numpy as np    
    df['target'] = 1*np.array(df['high_scarcity']!="")
    

    in which the 1* performs an integer conversion of the boolean values.

    If that is not the case, then a more complex approach should be taken

    res = np.array(["" for i in range(df.shape[0])])
    res[df['high_scarcity']!=""] = 1
    res[df['low_scarcity']!=""] = 0
    df['target'] = res