Search code examples
rfunctionautomationlabelfactors

Dynamic assignment to variables of a dataframe for adding value labels in R, use of factors, set_labels of "sjmisc" package


I want to assign value labels to numeric data so that the labels get displayed when I to tabulation OR generate some charts. Keeping this in mind I thought of using factors so that I can assign the labels and when needed generate some statistics like mean using the numeric values in the levels. My database has more than 150 variables and I need to assign value labels to around 120 of these variables. Variable to Variable to value labels would differ, might be same for few variables.

To illustrate the problem and to seed up the execution I created a sample data as below -

Q1 <- sample(1:5,20,replace = T)
Q2 <- sample(1:5,20,replace = T)
Q3 <- sample(1:5,20,replace = T)
Q4 <- sample(1:5,20,replace = T)
Q5 <- sample(1:5,20,replace = T)

df <- as.data.frame(cbind(Q1,Q2,Q3,Q4,Q5))
class(df)

I have a separate data frame that has the value and the labels for each question

mylabel <- data.frame(Q1 = 1:5,Q1_desc = c("Strongly Disagree","Disagree","Neither","Agree","Strongly Agree"),
                  Q2 = 1:5,Q2_desc = c("Strongly Disagree","Disagree","Neither","Agree","Strongly Agree"),
                  Q3 = 1:5,Q3_desc = c("Strongly Disagree","Disagree","Neither","Agree","Strongly Agree"),
                  Q4 = 1:5,Q4_desc = c("Strongly Disagree","Disagree","Neither","Agree","Strongly Agree"),
                  Q5 = 1:5,Q5_desc = c("Strongly Disagree","Disagree","Neither","Agree","Strongly Agree"))

Now let me illustrate the code for one variable -

df$Q1 <- factor(df$Q1,
              levels = c(1,2,3,4,5),
              labels = c("Strongly Disagree","Disagree","Neither","Agree","Strongly Agree"))

df$Q1
mean(as.numeric(df$Q1))
barplot(table(df$Q1))
table(df$Q1)

Above code makes Q1 as a factor and assigns the levels and labels to the Q1 variable in the data frame. I am able to generate the mean / barplot with labels and table with labels. Since I have many variables where this task needs to be performed I thought of writing a functions. And this is where I need some help!

Below is the code of the function -

getlabels <- function(varname){
  #varname <- "Q1"
  lev <- na.omit(with(mylabel, get(varname)))
  lab <- na.omit(with(mylabel,get(paste0(varname,"_desc"))))
  df$varname <- factor(with(df,get(varname)),
                     levels = lev,
                     labels = lab)
}

getlabels("Q2")

The above code does not give any error but does not update the df with the labels nor the levels for Q2. Q2 is still a numeric column. It seems the assignment of the factor function is not happening to df$varname. Can some one suggest why this could be happening and how can we over come this.

#

Then I tried different method by using the "sjmisc" package to achive this. I am able to achieve it for a single variable by below code -

df$Q2 <- set_labels(df$Q2,c("Strongly Disagree","Disagree","Neither","Agree","Strongly Agree"))
df$Q2

The above assigns the labels as a attribute. Now since I need to perform this for multiple variables I thought of converting this in to a function. Again in this case as well not able to update the df as the assignment is not happening. By using the assign function I don't get any error but the attributes are not getting updated.

getlabels2 <- function(varname){
  #varname <- "Q1"
  lev <- na.omit(with(mylabel, get(varname)))
  lab <- na.omit(with(mylabel,get(paste0(varname,"_desc"))))
  ##setting lab to named variable as set_labels needs a named variable
  names(lab) <- na.omit(paste("mylabel$","varname"))
  assign(paste("df$",varname),set_labels(with(df,varname),lab))
}

getlabels2("Q2")

df$Q2

Since the number of variables are more I believe solving the function would help automating this repetitive task. Finally I want to use the function in maybe lapply so that I don't have to call the function 120 time. It would help if someone can suggest on this as well.

Thank you!!


Solution

  • I'm sort of wondering why you don't just write a for loop and move on:

    for (i in names(df)){
      df[[i]] <- factor(df[[i]],
                        levels = mylabel[[i]],
                        labels = mylabel[[paste0(i,"_desc")]])
    }
    
    > str(df)
    'data.frame':   20 obs. of  5 variables:
     $ Q1: Factor w/ 5 levels "Strongly Disagree",..: 2 2 4 1 4 2 5 5 1 2 ...
     $ Q2: Factor w/ 5 levels "Strongly Disagree",..: 1 5 3 3 2 3 5 1 4 2 ...
     $ Q3: Factor w/ 5 levels "Strongly Disagree",..: 2 5 2 5 5 2 4 4 5 3 ...
     $ Q4: Factor w/ 5 levels "Strongly Disagree",..: 3 3 2 1 1 3 1 2 1 3 ...
     $ Q5: Factor w/ 5 levels "Strongly Disagree",..: 2 2 1 4 5 4 1 3 1 1 ...
    

    As a side note, it's best to avoid as.data.frame(cbind()); that's a bad code pattern and frankly just more typing than you need. df <- data.frame(Q1,Q2,Q3,Q4,Q5) was sufficient, and safer.