I am trying to delete a row that has a wrong, but the unknown value in one column.
my data frame looks something like this
'size' : ['small', 'small', 'medium','small','small'],
'length': [38, 62, 55,33,22],
'kinds' : ["A","#$" ,"B","C","A"]}
I want to drop the row that has the wrong value. The value is wrong if the value doesn't contain any value in kinds
kinds=["A","B","C"]
I tried something like this
df[df["kinds"].contains(kinds)]
but I couldn't use contains..
What should I do?
remove low counts from pandas data frame column on condition
you should delete low counted outlier categorical values:
for name in df.columns:
if df[name].dtypes == 'O':
s = df[name].value_counts()
df = df[df.isin(s.index[s >= 3]).values] # 3 value that proper for action.
If it is a numeric value, then you should apply outlier analysis. You can also change the categorical values into numeric delete outliers and transform the numerical into categorical if you want.