In python, you can generate a categorical code for a variable using .cat.code e.g.
df['col3'] = df['col3'].astype('category').cat.code
How do you do this in R ?
Fleshing this out a bit further for @Sid29:
The python method function .cat.code
extracts the numeric representation of the levels of a factor. The equivalent in R is:
a <- factor(c("good", "bad", "good", "bad", "terrible"))
as.numeric(a)
[1] 2 1 2 1 3
Note that .cat.code
will represent NA
(or NaN
same thing) as -1
while the above solution in R still preservers NA
and output will be simply NA
.
Edit: as.numeric(a)
is better. There's discussion on the use of labels
function inside as.numeric
function. See the warning in ?factor
:
In particular, as.numeric applied to a factor is meaningless, and may happen by implicit coercion. To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f)).
There are some anomalies associated with factors that have NA as a level. It is suggested to use them sparingly, e.g., only for tabulation purposes.
If you have an NA
value, it will coerce all values to NA
, thus the reason for using labels
. Interestingly, c(a)
works (see @42 answer below).