Search code examples

collapsing two factors into one?

Possible Duplicate:
Joining factor levels of two columns in R

I'm fairly new to R, and I'm trying to make my recoding script somewhat more effective and "correct". I've tried searching the forums but that got me nowhere - perhaps I'm using the wrong terminology and missed it, so please bear with me if the question has already been put up.

I have two factor-variables that I wish to collapse into one factor variable. They stem from the same survey and both measure educational level. The reason I have two variables in the first place is because of an unfortunate survey-construction, but thats beside the point. The main point to be made is that they are mutually exclusive (you can only be in one).

My data looks like this:

education       education2
9th grade       <NA>
9th grade       <NA>
<NA>            9th grade
<NA>            10th grade
10th grade      <NA>
11th grade      <NA>
<NA>            9th grade
<NA>            11th grade
<NA>            <NA>

and my script looks like this:     <- vector(length=length(df$education))
a.grade       <- which(df$education=="9th grade")
a.grade2      <- which(df$education2=="9th grade")
b.grade      <- which(df$education=="10th grade")
b.grade2     <- which(df$education2=="10th grade")
c.grade      <- which(df$education=="11th grade")
c.grade2     <- which(df$education=="11th grade")[a.grade]      <- as.character(df$education)[a.grade][a.grade2]     <- as.character(df$education2)[a.grade2][b.grade]     <- as.character(df$education)[b.grade][b.grade2]    <- as.character(df$education2)[b.grade2][c.grade]     <- as.character(df$education)[c.grade][c.grade2]    <- as.character(df$education2)[c.grade2]  <- factor([ =="FALSE"] =NA  <- factor(

Off course this is not bad but when you have two factor-variables with 15 levels a couple of times or more you start looking for quicker alternatives.

I've tried something like this but without any luck:

a.grade   <- which(df$education=="9th grade" | df$education2=="9th grade")
b.grade  <- which(df$education=="10th grade" | df$education=="10th grade")
c.grade  <- which(df$education=="11th grade" | df$education2=="11th grade")[a.grade]      <- as.character(df$education)  
[a.grade]|as.character(df$education2)[a.grade][b.grade]      <- as.character(df$education)          

giving me this: Error in as.character(df$education)[9th grade] | as.character(df$education2)[9th grade]: operations are possible only for numeric, logical or complex types

Is there a way to overcome this?

Thanks for any suggestions in advance


the result I'm aiming at is this:
9th grade
9th grade
9th grade
10th grade
10th grade
11th grade
9th grade
11th grade

the post: 'Joining factor levels of two columns in R' seems to be going for another result

again, thank you


  • Once they're character strings it's easy

    # make them character types
    ed <- levels(df$education)[df$education]
    ed2 <- levels(df$education2)[df$education2]
    # make one new factor that integrates them
    ed[] <- ed2[]
    # make it a factor again
    ed <- factor(ed)

    You could accelerate the process by reading them in as characters in the first place, especially if you already set column types in read.table.