Search code examples
pythonpandascount

Remove numerics in column prior to value_counts


I have a string df. I want to find a quick summary of the string columns, and am using describe as below:

df = pd.DataFrame({'col1':['1','2','C','T','A','00400'],
              'col2':['3241','H2','C8','T4','123','0000']})
df['col1'].value_counts()
pd.to_numeric(df['col1'], errors='coerce').describe()

but I also want to get a value count of the non numerics. How can I do that?

In this case, df['Col1'].value_counts(non_numerics) would yeild:

A  1
C  1
T  1

As my data has 1m rows, I would like to remove the numerics prior to completing the value counts.

Any suggestions?


Solution

  • You can filter the non-numeric:

    m = pd.to_numeric(df['col1'], errors='coerce').isna()
    
    out = df.loc[m, 'col1'].value_counts()
    

    Output:

    col1
    C    1
    T    1
    A    1
    Name: count, dtype: int64
    

    If you have initially NaNs in the column and want to count them as well:

    m1 = df['col1'].isna()
    m2 = pd.to_numeric(df['col1'], errors='coerce').isna()
    
    out = df.loc[m1|m2, 'col1'].value_counts(dropna=False)