I have a data set, DATA, with a variable, VAR. This variables mode is numeric, and its class is a factor. It represents gender. When printed out, it looks something like below
VAR
M
M
F
U
M
When I print out levels, it outputs: "" "F" "M" "U", and a frequency table looks like this:
F M U
2 30 25 1
What I want to do is change everything that is not "F" or "M" to be a missing values, then label them "Man" and "Woman", and drop unused levels for the variable (but still leave a level for missing). So far I have the code below:
DATA$VAR[DATA$VAR == "U" | DATA$VAR == ""] <- NA
But I got the exact same values for the levels, and now the frequency table looks like this:
F M U
0 30 25 0
I feel like I'm close, but not quite there. I don't understand how to deal with the level issues. Any help is greatly appreciated.
To create a factor where everything bar what was M and F become missing use levels
within a call to factor. To relabel these use the labels
argument
a <- factor(c("M","M","F","U","","M"))
a2 <- factor(a, levels = c('M','F'), labels =c('Male','Female'))
a2
# [1] Male Male Female <NA> <NA> Male
# Levels: Male Female
If you want to tally NA values in table
, set useNA = 'always'
or useNA='ifany'
table(a2, useNA = 'ifany')
## a2
## Male Female <NA>
## 3 1 2