Search code examples
pythonlistdataframeloopsrows

removing rows from dataframe if column value condition is met


stables=["usdt","usdc","tusd"]

my dataframe called "df" looks like this (actual dataframe has hundreds of rows):

               pairs           apy  network            pool name
0        [eth, wbtc]  1.150699e+01   cometh      cometh-eth-wbtc
1        [usdt, usdc]  1.814333e+01  cometh      cometh-usdt-usdc
2     [must, pickle]  2.175891e+01   cometh      cometh-must-pickle
3        [usdt, eth]  1.237610e+02   cometh      cometh-usdt-eth
4       [eth, matic]  2.181968e+01   cometh      cometh-eth-matic

the 'pairs' column has lists of items. Going row by row, I want to remove the whole row if any of the items in the 'pairs' column list is not included in the 'stables' list provided above.

in this case only the second row (indexed 1) would stay in the dataframe as both items are in the 'stables' list provided above.

any help is welcome.

thx


Solution

  • Assuming that the pairs column actually contains list of strings, you could explode that column, use isin to test which elements are members of the stables list, ang then use a groupby agg np.all to test if all elements of an intial row are members of stables:

    temp = df.explode('pairs')
    temp['keep'] = temp['pairs'].isin(stables)
    temp = temp.groupby(level=0)['keep'].agg(np.all)
    
    to_keep = df[temp]