Search code examples
rmissing-datanacategorical-datadummy-data

Recoding race variable with 9 categories to dummy


Allow me to preface this by saying that I am new to R. I cleaned some income and rent variables and now I am trying to recode my race variable from 9 categories to 2. The original variable is coded as follows:

1=White 2=Black 3=Native 4=Asian 5=A 6=B 7=C 8=D 9=E. I'm basically trying to eliminate all other races and only keep White and Black as a dummy variable, where White=0 and Black=1. Here's the code:

library(foreign)
library(ggplot2)
df<-read.dta("acs2010.dta")
View(df)
attach(df)
summary(df)

inctot[inctot==9999999]<-NA
inctot[inctot<=0]<-NA
summary(inctot)
incomesq<-(inctot)^2

rent[rent==0]<-NA
summary(rent)

levels(race)[1]<-"White"
levels(race)[2]<-"Black"
levels(race)[3:9]<-NA
levels(race)

ggplot(data=df,aes(x=race))+geom_bar()
view(df)

Manipulating the levels leaves me with "White" and "Black" but when I plot it, it shows the NA's as well. I'm not sure how to get rid of NA's in factor variables. Any ideas would be appreciated.


Solution

  • The approach in the question to recoding the race factor looks fine.

    It seems that the real problem here was omitting the NAs from the plot. Just subset the data frame:

    ggplot(data =df[!is.na(df$race),], aes(x=race)) + geom_bar()

    Further reading: