Possible Duplicate:
Joining factor levels of two columns in R
I'm fairly new to R, and I'm trying to make my recoding script somewhat more effective and "correct". I've tried searching the forums but that got me nowhere - perhaps I'm using the wrong terminology and missed it, so please bear with me if the question has already been put up.
I have two factor-variables that I wish to collapse into one factor variable. They stem from the same survey and both measure educational level. The reason I have two variables in the first place is because of an unfortunate survey-construction, but thats beside the point. The main point to be made is that they are mutually exclusive (you can only be in one).
My data looks like this:
education education2
9th grade <NA>
9th grade <NA>
<NA> 9th grade
<NA> 10th grade
10th grade <NA>
11th grade <NA>
<NA> 9th grade
<NA> 11th grade
<NA> <NA>
and my script looks like this:
highest.edu <- vector(length=length(df$education))
a.grade <- which(df$education=="9th grade")
a.grade2 <- which(df$education2=="9th grade")
b.grade <- which(df$education=="10th grade")
b.grade2 <- which(df$education2=="10th grade")
c.grade <- which(df$education=="11th grade")
c.grade2 <- which(df$education=="11th grade")
highest.edu[a.grade] <- as.character(df$education)[a.grade]
highest.edu[a.grade2] <- as.character(df$education2)[a.grade2]
highest.edu[b.grade] <- as.character(df$education)[b.grade]
highest.edu[b.grade2] <- as.character(df$education2)[b.grade2]
highest.edu[c.grade] <- as.character(df$education)[c.grade]
highest.edu[c.grade2] <- as.character(df$education2)[c.grade2]
highest.edu <- factor(highest.edu)
highest.edu[highest.edu =="FALSE"] =NA
highest.edu <- factor(highest.edu)
Off course this is not bad but when you have two factor-variables with 15 levels a couple of times or more you start looking for quicker alternatives.
I've tried something like this but without any luck:
a.grade <- which(df$education=="9th grade" | df$education2=="9th grade")
b.grade <- which(df$education=="10th grade" | df$education=="10th grade")
c.grade <- which(df$education=="11th grade" | df$education2=="11th grade")
highest.edu[a.grade] <- as.character(df$education)
[a.grade]|as.character(df$education2)[a.grade]
highest.edu[b.grade] <- as.character(df$education)
[b.grade]|as.character(df$education2)[b.grade]
giving me this: Error in as.character(df$education)[9th grade] | as.character(df$education2)[9th grade]: operations are possible only for numeric, logical or complex types
Is there a way to overcome this?
Thanks for any suggestions in advance
the result I'm aiming at is this:
highest.education
9th grade
9th grade
9th grade
10th grade
10th grade
11th grade
9th grade
11th grade
<NA>
the post: 'Joining factor levels of two columns in R' seems to be going for another result
again, thank you
Once they're character strings it's easy
# make them character types
ed <- levels(df$education)[df$education]
ed2 <- levels(df$education2)[df$education2]
# make one new factor that integrates them
ed[is.na(ed)] <- ed2[is.na(ed)]
# make it a factor again
ed <- factor(ed)
You could accelerate the process by reading them in as characters in the first place, especially if you already set column types in read.table
.