Search code examples
pythonpandasdataframecontainsdelete-row

How to drop a row that has a wrong (unknown) value in pandas


I am trying to delete a row that has a wrong, but the unknown value in one column.

my data frame looks something like this

'size'  : ['small', 'small', 'medium','small','small'],
'length': [38, 62, 55,33,22],
'kinds' : ["A","#$" ,"B","C","A"]}

I want to drop the row that has the wrong value. The value is wrong if the value doesn't contain any value in kinds

kinds=["A","B","C"]

I tried something like this

df[df["kinds"].contains(kinds)]

but I couldn't use contains..

What should I do?


Solution

  • remove low counts from pandas data frame column on condition

    you should delete low counted outlier categorical values:

    for name in df.columns:
        if df[name].dtypes == 'O':
            s = df[name].value_counts()
            df = df[df.isin(s.index[s >= 3]).values] # 3 value that proper for action.
    
    

    If it is a numeric value, then you should apply outlier analysis. You can also change the categorical values into numeric delete outliers and transform the numerical into categorical if you want.