Search code examples
rloopsdataframelabelfactors

Attach Labels to numeric codes of a data frame in R - efficient use of factors


I have a data which is in excel and I read that to store it in a data frame by the name "data". Similarly I the value labels for each variable and its code in a separate sheet by the name map. I read this sheet as well and store it is data frame by the name "map".

Link to the file is - https://www.wetransfer.com/downloads/bf0c5bfa88be20e4037d7fdc828ca66320161018075428/7f82a4

Below is the code to read the sheets-

library("readxl")
data <- read_excel("data_v1.xlsx",sheet = "data")
map <- read_excel("data_v1.xlsx",sheet = "map")

If you will notice the cells with less number of codes are with NA.

Now I want to attach the value labels to each of the codes from the map sheet to the data. I was doing a search and I realized that one of the ways to do this is by using factors. Where we define the levels and labels. I can use the original variable name in map file for the levels and for Labels I can use those with "_desc" as suffix.

Can some one please suggest an efficient way of doing this? I mean instead of writing codes for individual variables if we can do this in a loop? In the data file that i am attaching this is just a sample and the original data file would have more than 100 variables for which one would need to do this task of attaching labels.

Lastly, how can I attach variable labels? I mean,

Q1 should have the label "Like the packaging of the brand" Q2 should have the label "Like the taste of the brand" Q3 should have the label "Like the smell of the brand" Q4 should have the label "value for money"

Should I think of creating a separate sheet with one column having the variable name and next one having the labels for the variables? How would we attach these labels to the variables as factors only works for values I believe.

Finally I need to generate Table / cross tables where these Labels should get displayed. Charts where these value + Variable labels should get displayed.

Thank you!!

Prasad


Solution

  • I would write a for loop to do this repetitive task. One would need to check if the variable names in data and map are same. Your data is not available, however I feel below should work.

    for (i in names(data)[which(is.element(names(data),names(map)))]){
      data[[i]] <- factor(data[[i]],
                                     levels = na.omit(map[[i]]),
                                     labels = na.omit(map[[paste0(i,"_desc")]]))
    }
    

    na.oimt is to omit the rows that would have NAs.