I am using R dummy.data.frame function in the dummies package to create dummy variables for the k levels of my factor. Unfortunately, my factor has NAs. When I use dummy.data.frame it creates k dummies with no NAs and a new dummy which flags with 1 the missing values. However, I would like to still have the NAs in the k dummies and not a dummy for the missing values.
Is this possible with that function? Do you know any other functions that can help me?
I usually do this kind of things using the model.matrix()
. Using that with the option na.action set to pass will retain the NAs in their correct places. This option does not seem to change the behavior of the function dummy()
, so using model.matrix()
might be your easiest bet. For example, for a single factor letters the following should do the trick:
options(na.action="na.pass")
letters <- c( "a", "a", "b", "c", "d", "e", "f", "g", "h", "b", "b", NA )
model.matrix(~letters-1)
Or for several variables or columns of a data frame as well:
letters <- c( "a", "a", "b", "c", "d", "e", "f", "g", "h", "b", "b", NA )
betters <- c( "a", "a", "c", "c", "c", "d", "d", "d", NA, "e", "e", "e" )
model.matrix(~letters+betters-1)
The important trick here really is to set the option na.action. After this dummy recoding, it is a good idea to return the option to its default value:
options(na.action="na.omit")