Removing rows where there is a value match

def remove_low_data_states(column_name):
    items = df[column_name].value_counts().reset_index()
    items.columns = ['place', 'value']
    print(f'Items in column: [{column_name}] with low data')
    return list(items[items['value'].apply(lambda val: val < items.value.median())].place)

remove_low_data_states('col1') -- > returns ['hello', 'bye']

Orignal table

col1	col2	col3
hello	2	4
world	2	4
bye	2	4

Updated table

col1	col2	col3
world	2	4

The above method gives me a list of names within a column that do not pass the median criteria. How can I then use the list of names to go and remove the rows that are associated with the row value ??

I have tried using pd.drop but that is not to helpful, or I am making some sort of mistake.

Solution

We can use .isin()


def remove_low_data_states(column_name):
    items = df[column_name].value_counts().reset_index()
    items.columns = ['place', 'value']
    print(f'Items in column: [{column_name}] with low data')
    return list(items[items['value'].apply(lambda val: val < items.value.median())].place)

df = df[~df['col1'].isin(remove_low_data_states('col1'))]

df.head()