Search code examples
pythonnp

Issue with creating a column using np.where, ArrayType error


I have a dataframe in which I'm trying to create a binary 1/0 column when certain conditions are met. The code I'm using is as follows:

sd_threshold = 5

df1["signal"] = np.where(np.logical_and(df1["high"] >= df1["break"], df1["low"] 
<= df1["break"], df1["sd_round"] > sd_threshold), 1, 0)

The code returns TypeError: return arrays must be of ArrayType when the last condition df1["sd_round"] > sd_threshold is included, otherwise it works fine. There isn't any issue with the data in the df1["sd_round"] column.

Any insight would be much appreciated, thank you!


Solution

  • check the documentation -- np.logical_and() compares the first two arguments you give it and writes the output to the third. you could use a nested call but i would just go with & (pandas boolean indexing):

    df1["signal"] = np.where((df1["high"] >= df1["break"]) & 
                             (df1["low"] <= df1["break"]) &
                             (df1["sd_round"] > sd_threshold), 
                             1, 0)
    

    EDIT: you could actually just skip numpy and cast your boolean Series to int to yield 1s and 0s:

    mask = ((df1["high"] >= df1["break"]) & 
            (df1["low"] <= df1["break"]) &
            (df1["sd_round"] > sd_threshold))
    df1["signal"] = mask.astype(int)