There's something very basic I am missing here
d <- data.frame(
g0 = c("A", "B", NA, NA, "C", "C"),
g1 = LETTERS[1:6])
d
g0 g1
1 A A
2 B B
3 <NA> C
4 <NA> D
5 C E
6 C F
The I have this code, but it does not work
d$g0[is.na(d$g0)] <- d$g1[is.na(d$g0)]
Desired result.
d
g0 g1
1 A A
2 B B
3 C C
4 D D
5 C E
6 C F
It's always helpful to remember the original design rationale behind factors. They were intended for categorical variables that took on one of a fixed set of values. So imagine I changed your example slightly to be:
d <- data.frame(color = c("red", "blue", NA, NA, "green", "green"),
amount = c("high","low","low","mid","mid","high"))
> d
color amount
1 red high
2 blue low
3 <NA> low
4 <NA> mid
5 green mid
6 green high
Now it totally makes sense that R complains when we run the following:
> d$color[is.na(d$color)] <- d$amount[is.na(d$color)]
Warning message:
In `[<-.factor`(`*tmp*`, is.na(d$color), value = c(3L, 1L, NA, NA, :
invalid factor level, NA generated
because why would we ever want a color
of "high" or "mid"? That makes no sense. The mental model here is that either two factors really have nothing to do with each other, or if they do, their levels should be the same. So,
levels(d$color) <- c(levels(d$color),"low","mid")
d$color[is.na(d$color)] <- d$amount[is.na(d$color)]
this runs with no problems:
> d
color amount
1 red high
2 blue low
3 low low
4 mid mid
5 green mid
6 green high
even if the result is semantically nonsensical.
Of course, many people find all this factor level juggling irksome and would have simply done:
d <- data.frame(color = c("red", "blue", NA, NA, "green", "green"),
amount = c("high","low","low","mid","mid","high"),
stringsAsFactors = FALSE)
and then R won't care what you fill the NA
values with at all, because they aren't factors anymore.