I am working with a large dataset downloaded dataset with the end goal of joining many data frames.
For the past week or so, I have been unable to join data frames due to incompatibility of the data types "labelled" vs. "character." Ultimately I would like to map my function to the same variable in multiple data frames in a list.
Structure of each df is as follows (edited to change variables/attr names because I cannot share the data). The variable of interest I'm working with here is "CODE":
structure(list(VAR1 = structure(c(val, val, val, val, val, val), .Label = c("a",
"b", "c", "d"), class = "factor"), ID = c(1,
2, 3, 4, 5, 6), CODE = structure(c("c1", "c1",
"c1", "c1", "c1", "c1"), label = "instance code", units = "-4", class = c("labelled",
"character")), ...
I'm still relatively new to R/RStudio, so I thought for a while my issue was with mapping throughout the list, but when I pick one element to remove labels, it still doesn't work. It is almost as if R doesn't know that the label is there, despite the fact that when I use get_label, the label shows up (function below).
get_label(my.list[["my.df"]][["my.variable"]]
I have tried the following methods (I am showing it as if I am working with a single variable instead of the whole list, which is how I have been experimenting for the last couple of days):
class(my.list[["my.df"]][["my.variable"]] <- "character"
remove_label(my.list[["my.df"]][["my.variable"]]
## for one variable
unclass(my.list[["my.df"]][["my.variable"]])
## for entire list
my.list %>%
map_at("my.variable", ~ unclass)
## I also tried map in case it was a map_at issue--still didn't work.
zap_label(my.list[["my.df"]][["my.variable"]])
attr(my.list[["my.df"]][["my.variable"]], "label") <- NULL
as.character(my.list[["my.df"]][["my.variable"]])
Does anyone have any ideas? Could it be a bug in R, or is it just my relative inexperience with R showing?
I have also tried modifying these functions in case I was misinterpreting the label and it was value labels instead of variable labels causing the issue. It's not!
Thanks for any assistance!
You could use the labelled
package which allows to set / remove labels:
library(labelled)
my.df = data.frame(test = "a test")
labelled::var_label(my.df) <- list(test='a test label')
var_label(my.df$test)
#> [1] "a test label"
my.list <-list(my.df = my.df)
var_label(my.list[["my.df"]][["test"]])
#> [1] "a test label"
my.list[["my.df"]][["test"]] <- remove_labels(my.list[["my.df"]][["test"]])
var_label(my.list[["my.df"]][["test"]])
#> NULL
my.list[["my.df"]][["test"]]
#> [1] "a test"