Say I have x=["apple","orange","orange","apple","pear"]
I would like to have a categorical representation with integers e.g. y=[1,2,2,1,3]
. What would be the best way to do so?
You could use pd.factorize
and use field 0 for that:
In [465]: pd.factorize(x)
Out[465]: (array([0, 1, 1, 0, 2]), array(['apple', 'orange', 'pear'], dtype=object))
In [466]: pd.factorize(x)[0] + 1
Out[466]: array([1, 2, 2, 1, 3])