Search code examples
rlabelhmisc

Removing excess classes for a whole dataframe


I have data as follows, and a problem I regretfully don't seem to be able to reproduce:

dat <- structure(c(1, NA_real_), format.stata = "%8.0g", labels = c(female = 1, 
male = 2), class = c("haven_labelled", "vctrs_vctr", "double"
))

dat <- data.frame(dat)

lapply(dat, class)

[1] "haven_labelled" "vctrs_vctr"     "double"        

I would like to remove the custom labels and I tried a couple of the following things:

clear.labels <- function(x) {
  if(is.list(x)) {
    for(i in seq_along(x)) {
      class(x[[i]]) <- setdiff(class(x[[i]]), 'labelled') 
      attr(x[[i]],"label") <- NULL
    } 
  } else {
    class(x) <- setdiff(class(x), "labelled")
    attr(x, "label") <- NULL
  }
  return(x)
}

dat <- clear.labels(dat)

However this does not work because the class is haven_labelled. Obviously I could change that, but I would rather have something that works independent of name.

lapply(dat, class)
$dat
[1] "haven_labelled" "vctrs_vctr"     "double"        

I also tried:

dat <- data.frame(lapply(dat, unclass))

lapply(dat, class)

$dat
[1] "numeric"

For my actual data however, it does not seems to work, even though it has exactly the same data.

Are there any other options I could try?

EDIT: Would it not be a possibility to simply make the last class the only class?


Solution

  • Use haven’s zap_*() functions:

    library(haven)
    
    zapped <- dat |>
      zap_labels() |>
      zap_formats()
    
    zapped
    #   dat
    # 1   1
    # 2  NA
    
    class(zapped$dat)
    # "numeric"