Search code examples
pythonpandasdataframedescribe

Not getting stats analysis of binary column pandas


I have a dataframe, 11 columns 18k rows. The last column is either a 1 or 0, but when I use .describe() all I get is

count     19020
unique        2
top           1
freq      12332
Name: Class, dtype: int64

as opposed to an actual statistical analysis with mean, std, etc.

Is there a way to do this?


Solution

  • If your numeric (0, 1) column is not being picked up automatically by .describe(), it might be because it's not actually encoded as an int dtype. You can see this in the documentation of the .describe() method, which tells you that the default include parameter is only for numeric types:

    None (default) : The result will include all numeric columns.

    My suggestion would be the following:

    
    df.dtypes # check datatypes
    df['num'] = df['num'].astype(int) # if it's not integer, cast it as such
    
    df.describe(include=['object', 'int64']) # explicitly state the data types you'd like to describe
    

    That is, first check the datatypes (I'm assuming the column is called num and the dataframe df, but feel free to substitute with the right ones). If this indicator/(0,1) column is indeed not encoded as int/integer type, then cast it as such by using .astype(int). Then, you can freely use df.describe() and perhaps even specify columns of which data types you want to include in the description output, for more fine-grained control.