Search code examples
rforcats

Conditionally split factor level into two different levels


I have a data frame, e.g.:

df <- data.frame(
        type = c("BND", "INV", "BND", "DEL", "TRA"),
        chrom1 = c(1, 1, 1, 1, 1),
        chrom2 = c(1, 1, 2, 1, 3)
        )

I want to reassign all df[df$type=='BND',] instances to either INV or TRA depending on the values in chrom1 and chrom2.

I am trying to use fct_recode from the forcats package as so:

library(forcats)

df$type <- ifelse(df$type=="BND", 
                  ifelse(df$chrom1 == df$chrom2,
                         fct_recode(df$type, BND="INV"),
                         fct_recode(df$type, BND="TRA")),
                  df$type)

However, this recodes my factors as numbers:

  type chrom1 chrom2
1    1      1      1
2    3      1      1
3    1      1      2
4    2      1      1
5    4      1      3

Here's my expected outcome:

  type chrom1 chrom2
1    INV      1      1 # BND -> INV as chrom1==chrom2
2    INV      1      1
3    TRA      1      2 # BND -> TRA as chrom1!=chrom2
4    DEL      1      1
5    TRA      1      3

How can I split a factor into two levels in this way?


Solution

  • You can also do it with case_when()

    library(tidyverse)
    
    df %>% 
      mutate(type = as.factor(case_when(
        type == 'BND' & chrom1 == chrom2 ~ 'INV', 
        type == 'BND' & chrom1 != chrom2 ~ 'TRA',
        TRUE  ~ as.character(type))))
    

    data:

    df <- data.frame(
      type = c("BND", "INV", "BND", "DEL", "TRA"),
      chrom1 = c(1, 1, 1, 1, 1),
      chrom2 = c(1, 1, 2, 1, 3)
    )