Search code examples
rdata.tablelabelexpss

Using R expss and data.table is it possible to load data.table labels from a csv file instead of typing the code in by hand?


Applying labels is an important part of making survey data comprehensible when reported

So the best example I can find uses expss::apply_labels() e.g the famous mtcars example https://cran.r-project.org/web/packages/expss/vignettes/tables-with-labels.html

as input this requires a data.table and a list of comma separated assignment pairs e.g

apply_labels(dt, col1 = "label1", col2 = "label2", col3 = "label3")

This is fine if you have one data file and a few columns and you can be bothered typing them in for each time, but its not very helpful if you have lots of data files. So how could one load a csv metadata file in format:

Col1 Col2 Col3

Label1 Label2 Label3

where the Col names match the same names in the data table

this means effectively translating the metadata csv file so that it generates

coln = "labeln"

for each column.

So far I have found the biggest problem is that the apply labels column names are objects not strings and it is very difficult to translate a string to the object in the right scope.

This is where I've got to

    library(expss)
    library(data.table)
    library(glue)

    readcsvdata <- function(dfile)
     {
        rdata <- fread(file = dfile, sep = "," , quote = "\"" , header = TRUE, 
        stringsAsFactors = FALSE, na.strings = getOption("datatable.na.strings","NA"))
        return(rdata)
        }

    rawdatafilename <- "testdata.csv"
    rawmetadata <- "metadata.csv"

    mdt <- readcsvdata(rawmetadata)
    rdt <-readcsvdata(rawdatafilename)
    commonnames <- intersect(names(mdt),names(rdt))  # find common 
    qlabels <- as.character(mdt[1, commonnames, with = FALSE])

    comslist <- list()
    for (i in 1:length(commonnames)) # loop through commonnames and qlabels
          {  
          if (i == length(commonnames))
              {x <- glue('{commonnames[i]} = "{qlabels[i]}"')} # no comma for final item
              else 
              {x <- glue('{commonnames[i]} = "{qlabels[i]}",')} # comma for next item

          comslist[[i]] <- x
    }

comstring <- paste(unlist(comslist), collapse = '')

tdt = apply_labels(tdt, eval(parse(text = comstring)))

which yields

Error in parse(text = comstring) : :1:24: unexpected ',' 1: varone = "Label1", ^

oh and print(comstring) produces:

[1] "varone = \"Question one\",vartwo = \"Question two\",varthree = \"Question three\",varfour = \"Question four\",varfive = \"Question five\",varsix = \"Question six\",varseven = \"Question seven\",vareight = \"Question eight\",varnine = \"Question nine\",varten = \"Question ten\""


Solution

  • apply_labels is not very convenient for assignment labels from external dictionary. You can use var_lab instead:

    library(expss)
    library(data.table)
    
    readcsvdata <- function(dfile)
    {
        rdata <- fread(file = dfile, sep = "," , quote = "\"" , header = TRUE, 
                       stringsAsFactors = FALSE, na.strings = getOption("datatable.na.strings","NA"))
        return(rdata)
    }
    
    rawdatafilename <- "testdata.csv"
    rawmetadata <- "metadata.csv"
    
    mdt <- readcsvdata(rawmetadata)
    rdt <-readcsvdata(rawdatafilename)
    commonnames <- intersect(names(mdt),names(rdt))  # find common 
    qlabels <- as.list(mdt[1, commonnames, with = FALSE])
    
    
    for (each_name in commonnames) # loop through commonnames and qlabels
    {  
        var_lab(rdt[[each_name]]) <- qlabels[[each_name]]
    }
    

    There is a similar val_lab function for value labels. Additionally you may be interested in apply_dictionary and create_dictionary functions. To get help about them type ?apply_dictionary in the console.