Search code examples
pythonpandasintegeruniquecategorical-data

python pandas\numpy encode unique by integers


Say I have x=["apple","orange","orange","apple","pear"] I would like to have a categorical representation with integers e.g. y=[1,2,2,1,3]. What would be the best way to do so?


Solution

  • You could use pd.factorize and use field 0 for that:

    In [465]: pd.factorize(x)
    Out[465]: (array([0, 1, 1, 0, 2]), array(['apple', 'orange', 'pear'], dtype=object))
    
    In [466]: pd.factorize(x)[0] + 1
    Out[466]: array([1, 2, 2, 1, 3])