Search code examples
rserializationlabelbigdatahmisc

Labelling big serialized data in R (fst & Hmisc)


I have a question about how to save labeled data when using the fst package. I've been using the Hmisc package to label data, such as:

library(fst)
library(Hmisc)

# make an example dataset
df <- data.frame(id = letters[1:4],
                 val = 1:4)

# apply labels to dataframe
label(df$id) <- "identifier"
label(df$val) <- "value"

label(df)
         id          val 
"identifier"      "value" 

But my actual data is quite large, so I've switched to saving it as fst rather than rds. Recently, though, I've noticed that this appears to make my labels disappear:

fp <- '~/filepath/example_code_data/label_ex.fst'

# save as fst
write_fst(df, fp)

# open data again
df2 <- read_fst(fp)

label(df2)
 id val 
 ""  ""

When I save the data as an rds I have no issues with the labels. Any more insight into what's going on or solutions would be greatly appreciated.


Solution

  • For anyone who has a similar issue in the future, it appears this is a known bug of fst. One helpful commentator on that thread suggests qs as a similar package that retains labels, although I'm remiss to lose fst's random column access, among other nice features.