I have a data frame that gives the following output to create dummy variables.
library(dummies)
df1 <- data.frame(id = 1:4, year = 1991:1994)
df1 <- cbind(df1, dummy(df1$year, sep = "_"))
df1
# id year df1_1991 df1_1992 df1_1993 df1_1994
#1 1 1991 1 0 0 0
#2 2 1992 0 1 0 0
#3 3 1993 0 0 1 0
#4 4 1994 0 0 0 1
I have to tried to create a functional programming to achieve the same.
dummy_df <- function(dframe, x){
dframe <- cbind(dframe, dummy(dframe$x, sep = "_"))
return(dframe)
}
However when I run the output, I am getting the following error.
dummy_df(df1, year)
#Error in `[[.default`(x, 1) : subscript out of bounds
How to rectify this mistake and create an automatic function for creating dummy variables? Additionally, it would better if the function provides the option of whether to keep or discard the initial column that is being separated to create the dummy variables. For eg, in case of the above data frame, the option to keep or discard should be applied to column year
.
This question has been posted after observing a similar question here. Pass a data.frame column name to a function
The problem is that when year
is passed unquoted, it is a symbol representing a variable, not a string, a variable name. A standard trick to get a character string is the use of deparse(substitute(.))
. Then the extractor [[
works.
dummy_df <- function(dframe, x){
x <- deparse(substitute(x))
dframe <- cbind(dframe, dummy(dframe[[x]], sep = "_"))
return(dframe)
}
dummy_df(df1, year)
# id year df1_1991 df1_1992 df1_1993 df1_1994
#1 1 1991 1 0 0 0
#2 2 1992 0 1 0 0
#3 3 1993 0 0 1 0
#4 4 1994 0 0 0 1
#Warning message:
#In model.matrix.default(~x - 1, model.frame(~x - 1), contrasts = FALSE) :
# non-list contrasts argument ignored
If the column x
can be passed quoted, change the function above to as.character(substitute(.))
. The function will accept both quoted and unquoted x
.
dummy_df <- function(dframe, x){
x <- as.character(substitute(x))
dframe <- cbind(dframe, dummy(dframe[[x]], sep = "_"))
return(dframe)
}
dummy_df(df1, year)
dummy_df(df1, "year")
Following a OP's comment, to keep or remove the column x
can be solved with an extra function argument, keep
, defaulting to TRUE
.
dummy_df <- function(dframe, x, keep = TRUE){
x <- as.character(substitute(x))
if(keep){
dftmp <- dframe
} else {
i <- grep(x, names(dframe))
if(length(i) == 0) stop(paste(sQuote(x), "is not a valid column"))
dftmp <- dframe[-i]
}
dframe <- cbind(dftmp, dummy(dframe[[x]], sep = "_"))
return(dframe)
}
dummy_df(df1, year)
dummy_df(df1, "year")
dummy_df(df1, year, keep = FALSE)
dummy_df(df1, month, keep = FALSE)