Applying labels is an important part of making survey data comprehensible when reported
So the best example I can find uses expss::apply_labels() e.g the famous mtcars example https://cran.r-project.org/web/packages/expss/vignettes/tables-with-labels.html
as input this requires a data.table and a list of comma separated assignment pairs e.g
apply_labels(dt, col1 = "label1", col2 = "label2", col3 = "label3")
This is fine if you have one data file and a few columns and you can be bothered typing them in for each time, but its not very helpful if you have lots of data files. So how could one load a csv metadata file in format:
Col1 Col2 Col3
Label1 Label2 Label3
where the Col names match the same names in the data table
this means effectively translating the metadata csv file so that it generates
coln = "labeln"
for each column.
So far I have found the biggest problem is that the apply labels column names are objects not strings and it is very difficult to translate a string to the object in the right scope.
This is where I've got to
library(expss)
library(data.table)
library(glue)
readcsvdata <- function(dfile)
{
rdata <- fread(file = dfile, sep = "," , quote = "\"" , header = TRUE,
stringsAsFactors = FALSE, na.strings = getOption("datatable.na.strings","NA"))
return(rdata)
}
rawdatafilename <- "testdata.csv"
rawmetadata <- "metadata.csv"
mdt <- readcsvdata(rawmetadata)
rdt <-readcsvdata(rawdatafilename)
commonnames <- intersect(names(mdt),names(rdt)) # find common
qlabels <- as.character(mdt[1, commonnames, with = FALSE])
comslist <- list()
for (i in 1:length(commonnames)) # loop through commonnames and qlabels
{
if (i == length(commonnames))
{x <- glue('{commonnames[i]} = "{qlabels[i]}"')} # no comma for final item
else
{x <- glue('{commonnames[i]} = "{qlabels[i]}",')} # comma for next item
comslist[[i]] <- x
}
comstring <- paste(unlist(comslist), collapse = '')
tdt = apply_labels(tdt, eval(parse(text = comstring)))
which yields
Error in parse(text = comstring) : :1:24: unexpected ',' 1: varone = "Label1", ^
oh and print(comstring) produces:
[1] "varone = \"Question one\",vartwo = \"Question two\",varthree = \"Question three\",varfour = \"Question four\",varfive = \"Question five\",varsix = \"Question six\",varseven = \"Question seven\",vareight = \"Question eight\",varnine = \"Question nine\",varten = \"Question ten\""
apply_labels
is not very convenient for assignment labels from external dictionary. You can use var_lab
instead:
library(expss)
library(data.table)
readcsvdata <- function(dfile)
{
rdata <- fread(file = dfile, sep = "," , quote = "\"" , header = TRUE,
stringsAsFactors = FALSE, na.strings = getOption("datatable.na.strings","NA"))
return(rdata)
}
rawdatafilename <- "testdata.csv"
rawmetadata <- "metadata.csv"
mdt <- readcsvdata(rawmetadata)
rdt <-readcsvdata(rawdatafilename)
commonnames <- intersect(names(mdt),names(rdt)) # find common
qlabels <- as.list(mdt[1, commonnames, with = FALSE])
for (each_name in commonnames) # loop through commonnames and qlabels
{
var_lab(rdt[[each_name]]) <- qlabels[[each_name]]
}
There is a similar val_lab
function for value labels. Additionally you may be interested in apply_dictionary
and create_dictionary
functions. To get help about them type ?apply_dictionary
in the console.