I have a data.frame with 2 million rows. One of the column is an alphanumeric Id which is repeated in that column with a unique count of 300000?
>head(df$ID)
ID
AB00153232de
AB00153232de
AB00153232de
AB00155532gh
AB00155532gh
AB00158932ij
>df$ID<-factor(df$ID)
When I try to print that factor variable I get something like this:
>df$ID
[1] AB00153232de AB00153232de AB00153232de AB00155532gh AB00155532gh AB00158932ij
320668 Levels: AB00153232de AB00155532gh AB00158932ij.....
Is the factor not being stored as a numeric vector and why?
use unclass
on the factor variable. It keeps the factor levels as attribute of the new variable, so that if you need it in future, you can make use of it.
df1$ID
# [1] AB00153232de AB00153232de AB00153232de AB00155532gh AB00155532gh AB00158932ij
# Levels: AB00153232de AB00155532gh AB00158932ij
unclass(df1$ID)
# [1] 1 1 1 2 2 3
# attr(,"levels")
# [1] "AB00153232de" "AB00155532gh" "AB00158932ij"
Data:
df1 <- structure(list(ID = structure(c(1L, 1L, 1L, 2L, 2L, 3L),
.Label = c("AB00153232de", "AB00155532gh", "AB00158932ij"), class = "factor")),
.Names = "ID", row.names = c(NA, -6L), class = "data.frame")