I have a big data set from a questionary. Importing it from SPSS to R (using SPSS's Stata-Output) gave me the answer to each question as factor.
A question has answers from 1 to 10. However, there are a lot of missing values. R recoginzes them aswell.
However, now I'd like to do some calculations - for example I want to calculate the mean of an answer (not very good statistics, I know, never mind).
So I have to make recode the factors to numerics. I did this with as.numeric()
.
However, now I have missing values encoded as 11 to 14. Of course I can't calculate any mean like this.
What would be the proper way to recode factors as numerics and tell R to set any value bigger than 10 to NA?
Example: Do you like fish?
not at all very much | don't know no answer don't tell
R: 1 2 3 4 5 6 7 8 9 10 | 11 12 13
If you really don't need the missing values, I'd do something like:
a[a>10] <- NA
Then, you can use:
mean(a, na.rm=TRUE)
Alternately, if you want to work around those missing values, you can just use:
mean(a[a<=10])