I have a string df. I want to find a quick summary of the string columns, and am using describe as below:
df = pd.DataFrame({'col1':['1','2','C','T','A','00400'],
'col2':['3241','H2','C8','T4','123','0000']})
df['col1'].value_counts()
pd.to_numeric(df['col1'], errors='coerce').describe()
but I also want to get a value count of the non numerics. How can I do that?
In this case, df['Col1'].value_counts(non_numerics)
would yeild:
A 1
C 1
T 1
As my data has 1m rows, I would like to remove the numerics prior to completing the value counts.
Any suggestions?
You can filter the non-numeric:
m = pd.to_numeric(df['col1'], errors='coerce').isna()
out = df.loc[m, 'col1'].value_counts()
Output:
col1
C 1
T 1
A 1
Name: count, dtype: int64
If you have initially NaNs in the column and want to count them as well:
m1 = df['col1'].isna()
m2 = pd.to_numeric(df['col1'], errors='coerce').isna()
out = df.loc[m1|m2, 'col1'].value_counts(dropna=False)