I have a large-ish data frame (100k Row x 50 Col) with several factor variables. I want a small subset (like 100 rows) to do some prototyping with. The problem is when I type :
train <- train[1:100,]
the size shrinks (using dim()
) but it still appears to store all the factors from the original data frame (I'm measuring memory size using lsos()
found here).
Is there a way to get around this? So far the only way I've found is to turn the factor variables to character strings then subset, then convert to factors again. I feel like there has to be a better way to do this.
Any suggestions?
Use droplevels
function to get rid of the levels that are not in the new data.frame, see ?droplevels
for more info.
Example:
> DF <- data.frame(num=1:15, letter=rep(letters[1:5], each=3),random=rnorm(15))
> levels(DF[, 2]) # all levels
[1] "a" "b" "c" "d" "e"
>
> DF2 <- DF[1:10, ] # subseting
> levels(DF2[, 2]) # all levels again
[1] "a" "b" "c" "d" "e"
> DF2[, 2] <- droplevels(DF2[, 2])
> levels(DF2[, 2]) # only the levels contained in DF2
[1] "a" "b" "c" "d"