Search code examples
rdplyrforcats

How do I use fct_relabel with strsplit or similar to relabel a factor column?


I am trying to automatically change the labels of a factor column using tidyverse code, and I have having trouble changing the labels based on a simple function.

Some example data would look like:

   subjectid Parameter   value
   <chr>     <fct>       <dbl>
 1 13        alpha_IST  0.0751
 2 13        alpha_IEX 15.7   
 3 13        alpha_CB   0.236 
 4 15        alpha_IST  0.0680
 5 15        alpha_IEX 16.5   
 6 15        alpha_CB   0.282 
 7 17        alpha_IST  0.0793

(To reproduce, the output from dput on the first 6 rows is listed below)

structure(
  list(
    subjectid = c("13", "13", "13", "15", "15", "15"),
    Parameter = structure(c(3L, 2L, 1L, 3L, 2L, 1L), .Label = c("alpha_CB", "alpha_IEX", "alpha_IST"), class = "factor"),
    value = c(0.0751, 15.7, 0.236, 0.0680, 16.5, 0.282)
  ),
  row.names = c(NA, -6L),
  class = c("tbl_df", "tbl", "data.frame")
)

I am trying to strip out the redundant first half of the Parameter labels (ie remove alpha_).

Given that the above object is called medians, I can do this using:

par_labels <- sapply(
  strsplit(levels(medians$Parameter), "_"),
  function(x) {
    x[2]
  }
)

medians %>% mutate(Parameter = factor(Parameter, labels = par_labels))

It seems I should be able to build this same functionality using the fct_relabel function, however I cannot seem to get it to work.

I have tried:

medians %>%
  mutate(Parameter = fct_relabel(Parameter, function(x) {
    strsplit(x, "_")[2]
  }))

which gives the error Error: Problem with mutate() input Parameter. ✖ new_levels must be a character vector.

I also tried:

medians %>%
  mutate(Parameter = fct_relabel(Parameter, function(x) {
    strsplit(x, "_")[[1]][2]
  }))

which has an error message as follows: Error: Problem with mutate() input Parameter. ✖ new_levels must be the same length as levels(f): expected 3 new levels, got 1.

There are other combinations I have tried with a similar lack of success, and I could see that converting to a character vector, using tidyr to separate and then convert back to a factor would work, but I feel it should be possible in a way similar to what I have tried. Is this possible?


Solution

  • You can use fct_relabel as :

    library(dplyr)
    library(forcats)
    
    medians %>%
      mutate(Parameter = fct_relabel(Parameter, 
                           function(x) sapply(strsplit(x, "_"), `[`, 2)))
    
    # subjectid Parameter   value
    #  <chr>     <fct>       <dbl>
    #1 13        IST        0.0751
    #2 13        IEX       15.7   
    #3 13        CB         0.236 
    #4 15        IST        0.068 
    #5 15        IEX       16.5   
    #6 15        CB         0.282 
    

    However for this problem this is what I would have used in base R :

    levels(medians$Parameter) <- sub('.*_', '', levels(medians$Parameter))
    

    Or with fct_relabel :

    medians %>%
      mutate(Parameter = fct_relabel(Parameter, ~ sub('.*_', '', .x)))