Search code examples
rdataframegroup-bysubtraction

How to calculate the difference between values in one column based on another column?


I'm trying to calculate the difference between abundance at the time points C1 and C0. I'd like to do this for the different genes, so I've used group_by for the genes, but can't figure out how to find the difference in abundance at the different time points.

Here is one of my attempts:


IgH_CDR3_post_challenge_unique_vv <- IgH_CDR3_post_challenge_unique_v %>% 
  group_by(gene ) %>% 
  mutate(increase_in_abundance = (abunance[Timepoint=='C1'])-(abunance[Timepoint=='C0'])) %>% 
  ungroup() 

My data looks something like this:

gene Timepoint abundance
1 C0 5
2 C1 3
1 C1 6
3 C0 2

Solution

  • Assuming (!) you will have one entry per gene and timepoint (as opposed to the table posted in the question), you can pivot_wider your data and then calculate the difference for every gene. The current example, of course, isn't very helpful with mostly missings.

    df <- data.frame(gene = c(1, 2, 1, 3),
                     Timepoint = c("c0", "c1", "c1", "c0"),
                     abundance = c(5, 3, 6, 2))
    
    library(tidyverse)
    
    df %>%
      pivot_wider(names_from = Timepoint,
                  values_from = abundance,
                  id_cols = gene) %>%
      mutate(increase_in_abundance = c1 - c0)
    
    # A tibble: 3 x 4
       gene    c0    c1 increase_in_abundance
      <dbl> <dbl> <dbl>                 <dbl>
    1     1     5     6                     1
    2     2    NA     3                    NA
    3     3     2    NA                    NA