Search code examples
pythonpandasanalysis

Exploratory data analysis - dropping all columns by using output from pd.Series.nunique


I am conducting EDA on a dataset with 96 variables, which is a slice of an even larger dataset.I am looking to drop those columns where there is only a single value in the column.

data_SS.apply(pd.Series.nunique)# this revealed that I had over a dozen variables where there was a single value. They were not relevant variables.

I defined my #columns using the row headers

columns = ['aaa', 'bbb', 'ccc', 'ddd' .....]

then dropped the columns.

data.drop(columns, inplace = True, axis = 1)

This did the job. However I wonder if there is way to iterate over the output from the pd.Series.nunique as I basically want to drop the columns where the output value ==1. I am sure there is a more elegant solution.


Solution

  • You can create a list of columns by indexing the above series and drop all the columns,

    cols_to_drop = df1.loc[:,df1.nunique().eq(1)].columns
    df1.drop(cols_to_drop, inplace = True, axis = 1)
    

    Another way of finding columns to drop,

    s = df1.nunique().eq(1)
    cols_to_drop = s.index[s]