Search code examples
python-3.xpandasdataframevectorization

Filter Dataframe by comparing one column to list of other columns


I have a dataframe with numerous float columns. I want to filter the dataframe, leaving only the values that are inbetween the High and Low columns of the same dataframe.

I know how to do this when the conditions are one column compared to another column. But there are 102 columns, so I cannot write a condition for each column. And all my research just illustrates how to compare two columns and not one column against all others (or I am not typing the right search terms).

I tried df= df[ (df['High'] <= df[DFColRBs]) & (df['Low'] >= df[DFColRBs])].copy() But it erases everything.

and I tried booleanselction = df[ (df[DFColRBs].between(df['High'],df['Low'])]

and I tried: df= df[(df[DFColRBs].ge(df['Low'])) & (df[DFColRBs].le(df['Low']))].copy()

and I tried:

BoolMatrix =  (df[DFColRBs].ge(DF_copy['Low'], axis=0)) & (df[DFColRBs].le(DF_copy['Low'], axis=0)) 
df= df[BoolMatrix].copy()

But it erases everything in dataframe, even 3 columns that are not included in the list.

I appreciate the guidance.

Example Dataframe:

    High     Low    Close   _1m_21  _1m_34  _1m_55  _1m_89  _1m_144 _1m_233 _5m_21  _5m_34  _5m_55
0   1.23491 1.23456 1.23456 1.23401 1.23397 1.23391 1.2339  1.2337  1.2335  1.23392 1.23363 1.23343
1   1.23492 1.23472 1.23472 1.23422 1.23409 1.234   1.23392 1.23375 1.23353 1.23396 1.23366 1.23347
2   1.23495 1.23479 1.23488 1.23454 1.23422 1.23428 1.23416 1.23404 1.23372 1.23415 1.234   1.23367
3   1.23494 1.23472 1.23473 1.23457 1.23425 1.23428 1.23417 1.23405 1.23373 1.23415 1.234   1.23367


Solution

  • Based on what you've said in the comments, best to split the df into the pieces you want to operate on and the ones you don't, then use matrix operations.

    tmp_df = DF_copy.iloc[:, 3:].copy()
    # or tmp_df = DF_copy[DFColRBs].copy()
    
    # mask by comparing test columns with the high and low columns
    m = tmp_df.le(DF_copy['High'], axis=0) & tmp_df.ge(DF_copy['Low'], axis=0)
    
    # combine the masked df with the original cols
    DF_copy2 = pd.concat([DF_copy.iloc[:, :3], tmp_df.where(m)], axis=1)
    # or replace with DF_copy.iloc[:, :3] with DF_copy.drop(columns=DFColRBs)