The haven
package provides a very useful function for exporting a data frame/Tibble to Stata using the write_dta() function.
When an R factor is written into Stata (using the write_dta() function) the internal R factor levels become the numerical value saved in Stata long format with the levels being written as the variable labels. (These internal R factor levels are the same as applying as.numeric(factor)
to a factor.)
I want to explicitly set R's internal factor level so as to have the desired values for numlabels in Stata.
To illustrate:
eyes <- c("blue", "brown","green", "blue", "not disclose")
eyes_factor <- as.factor(eyes)
levels(eyes_factor)
#[1] blue brown green blue not disclose
#Levels: blue brown green not disclose
as.numeric(as.factor(eyes))
#[1] 1 2 3 1 4 # which is to be expected
However, I want to set R's internal factor levels according to a highly specific pattern. For instance I want the internal level for:
blue = 2 and brown = 1 and green = 6 while not disclose = -1
Because this matches the coding on a questionnaire.
I have tried using the lvls_recode from the forcats
package.
The function looks like this:
forcats::lvls_reorder
function (f, idx, ordered = NA)
{
f <- check_factor(f)
if (!is.numeric(idx)) {
stop("`idx` must be numeric", call. = FALSE)
}
if (!setequal(idx, lvls_seq(f)) || length(idx) != nlevels(f)) {
stop("`idx` must contain one integer for each level of `f`",
call. = FALSE)
}
refactor(f, levels(f)[idx], ordered = ordered)
}
But as you can see here, the new idx which I would need to specific I cannot because only sequential numbers are taken.
Looking at the stats::relevel()
too did not solve problem.
If it weren't for the -1 = disclose
, you could do this simply with something like:
eyes2 <- factor(eyes,
levels = c("brown", "blue", paste0("not_used_", 1:3), "green", "not disclose"))
That would be exactly what you want but not disclose
is 7
rather than -1
. One option could be to do it this way, then recode it in Stata. A variant would be to force those not disclose
values to be NA
(eg just by not including "not disclose" as a valid level) - not sure how that comes into Stata.
R factors can't have -1 as one of the underlying codes. So I don't think there's any simple way to get around this. You'll have to recode them yourself, making a look up table. For example:
eye_codes <- data.frame(code = c(-1, 1, 2, 6),
level = c("not disclose", "brown", "blue", "green"),
stringsAsFactors = FALSE)
library(dplyr)
eyes3 <-left_join(data.frame(eyes), eye_codes, by = c("eyes" = "level"))
eyes3
Which gets you:
eyes code
1 blue 2
2 brown 1
3 green 6
4 blue 2
5 not disclose -1
The code column is what you want here. Note I used dplyr::left_join
rather than merge
to have easier-controlled behaviour of the ordering of the result.
This is a bit of a pain of course. Me, I'd save the data out of R as platform-agnostic character text (not factors at all, which just seems to have too many risks), then if you need them explicitly coded in a particular way in Stata, do that recoding in Stata.