Search code examples
machine-learningregressiondata-sciencedata-visualizationfeature-extraction

error in getting categorial features from train dataset


My train data looks something like this: train data

To extract categorial features out of it I ran following code"

categorial=[c for c in train.columns if train.columns(c).dtype in ['object'] ]

But I am getting error:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-31-31eb7ac47e21> in <module>
----> 1 categorial=[c for c in train.columns if train.columns[c].dtype in ['object'] ]

<ipython-input-31-31eb7ac47e21> in <listcomp>(.0)
----> 1 categorial=[c for c in train.columns if train.columns[c].dtype in ['object'] ]

/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py in __getitem__(self, key)
   4295         if is_scalar(key):
   4296             key = com.cast_scalar_indexer(key, warn_float=True)
-> 4297             return getitem(key)
   4298 
   4299         if isinstance(key, slice):

IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

What is the possible solution?


Solution

  • Use this to select the 'object' type variables-

    categorical = train.select_dtypes('object')
    

    If you just want the variable names -

    categorical_cols = train.select_dtypes('object').columns.tolist()