I am trying to make a factor variable out of a numeric variable in R. I would like to keep track of NA's and the new bins I am creating. Within the new bins, some numbers are of a valid range and some are not. I care about the bins themselves but want to create an "invalid" level that will house anything that does not fall in a designated range.
Here is an example:
library(reshape)
fac <- c(-1, 1, 2, 3, 4, 100, NA)
fac <- cut(fac, c(-Inf, 1, 2, 3, Inf))
fac <- addNA(fac)
combine_factor(fac,
variable=order(levels(fac))[c(2,3,5)],
other.label = "Invalid")
Which would give me some output that would have the levels I want to be intervals, NA, or invalid.
However, the trouble I am having is I do not want to code the variable using numbers because I have multiple different data sets and not all of them contain each level of the factor.
If I change the variable so that it does not contain any of a certain level of the factor:
fac <- c(-1, 1, 3, 4, 100, NA)
I keep getting the error:
Error in factor(nvar[as.numeric(fac)], labels=c(levels(fac)[variable], : invalid 'labels'; length 4 should be 1 or 3.
Output 1 (which works because I have no levels occurring 0 times):
[1] (1,2] (1,2] (2,3] <NA> Invalid Invalid Invalid
Levels: (1,2] (2,3] <NA> Invalid
Output 2 (where one level: (1,2] has 0 occurrences):
[1] (2,3] <NA> Invalid Invalid Invalid
Levels: (1,2] (2,3] <NA> Invalid
The second scenario is where I experience the error.
Is there any way I can get around this error?
I don't know much about the combine_factor
function, but it seems pretty easy to write your own....
Here's a basic example:
NewLevs <- function(fac, keep, others = "Invalid") {
lf <- levels(fac)
nl <- c(setNames(as.list(lf[keep]), lf[keep]),
setNames(as.list(lf[-keep]), rep(others, length(lf)-length(keep))))
levels(fac) <- nl
fac
}
Here's some sample data:
fac1 <- c(-1, 1, 2, 3, 4, 100, NA)
fac1 <- addNA(cut(fac1, c(-Inf, 1, 2, 3, Inf)))
fac2 <- c(-1, 1, 3, 4, 100, NA)
fac2 <- addNA(cut(fac2, c(-Inf, 1, 2, 3, Inf)))
Put the function to work:
fac1
# [1] (-Inf,1] (-Inf,1] (1,2] (2,3] (3, Inf] (3, Inf] <NA>
# Levels: (-Inf,1] (1,2] (2,3] (3, Inf] <NA>
NewLevs(fac1, c(2, 3, 5))
# [1] Invalid Invalid (1,2] (2,3] Invalid Invalid <NA>
# Levels: (1,2] (2,3] <NA> Invalid
fac2
# [1] (-Inf,1] (-Inf,1] (2,3] (3, Inf] (3, Inf] <NA>
# Levels: (-Inf,1] (1,2] (2,3] (3, Inf] <NA>
NewLevs(fac2, c(2, 3, 5))
# [1] Invalid Invalid (2,3] Invalid Invalid <NA>
# Levels: (1,2] (2,3] <NA> Invalid
The desired levels plus the label for unwanted levels can be changed:
NewLevs(fac2, c(1, 2, 3), "Wrong")
# [1] (-Inf,1] (-Inf,1] (2,3] Wrong Wrong Wrong
# Levels: (-Inf,1] (1,2] (2,3] Wrong