Search code examples
python-3.xpandasdataframesetunique

Pandas: check the unique values of a column with datatype 'set'


I have a pandas data frame my_df. The column animals in my_df has the datatype 'set'. Then I tried to use the following code to check how many different values (sets) in this column animals:

print(my_df.animals.unique())

But got the following error:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-e41c02e0e954> in <module>()

     11 
---> 12 print(my_df.animals.unique())


/usr/local/lib/python3.4/dist-packages/pandas/core/series.py in unique(self)
   1237     @Appender(base._shared_docs['unique'] % _shared_doc_kwargs)
   1238     def unique(self):
-> 1239         result = super(Series, self).unique()
   1240         if is_datetime64tz_dtype(self.dtype):
   1241             # to return array of Timestamp with tz

/usr/local/lib/python3.4/dist-packages/pandas/core/base.py in unique(self)
    971         else:
    972             from pandas.core.nanops import unique1d
--> 973             result = unique1d(values)
    974         return result
    975 

/usr/local/lib/python3.4/dist-packages/pandas/core/nanops.py in unique1d(values)
    809     else:
    810         table = _hash.PyObjectHashTable(len(values))
--> 811         uniques = table.unique(_ensure_object(values))
    812     return uniques
    813 

pandas/src/hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.unique (pandas/hashtable.c:14383)()

TypeError: unhashable type: 'set'

Solution

  • I'm not sure this is exactly what you want, but you can give it a try:

    In [135]: df
    Out[135]:
         animals
    0     {1, 2}
    1  {1, 2, 3}
    2     {1, 2}
    
    In [136]: df.animals.astype(str).unique()
    Out[136]: array(['{1, 2}', '{1, 2, 3}'], dtype=object)