Search code examples
rsortingsubtraction

Subtracting groups from two columns based on character string of another column


I have a metadata set where I want to calculate the difference of two numeric columns if they match specific character strings in a different column. For instance, the column patch from the dput() contains many different IDs: GL134_UP_1, GL24_DN_2, etc where GL# is an ID, UP or DN is a location, and _1, _2, or _3 are replicates. For each individual GL# of a single location and all replicates (GL134_UP_1, GL134_UP_2, GL134_UP_3) I want to subtract numeric values from the column sn_sp_dist from the same GL#'s down location (GL134_DN_1, GL134_DN_2, GL134_DN_3).

I imagine I'll be needing group_by() to specify the character column, but then I'm lost defining specific GL#'s.

library(dplyr)
diff <- met.ex %>%
group_by(patch) %>%


> dput(met.ex)
structure(list(patch = structure(c(4L, 5L, 6L, 1L, 2L, 3L, 10L, 
11L, 7L, 8L, 9L), .Label = c("GL134_DN_1", "GL134_DN_2", "GL134_DN_3", 
"GL134_UP_1", "GL134_UP_2", "GL134_UP_3", "GL22_DN_1", "GL22_DN_2", 
"GL22_DN_3", "GL22_UP_1", "GL22_UP_3"), class = "factor"), sn_sp_dist = structure(c(6L, 
7L, 2L, 5L, 9L, 10L, 8L, 4L, 3L, 9L, 1L), .Label = c("123", "165", 
"2", "248", "321", "34", "42", "44", "77", "987"), class = "factor")), class = "data.frame", row.names = c(NA, 
-11L))

Solution

  • You could do something like this:

    colnames(met.ex) <- c("patch", "value")
    
    met.ex %>% 
        separate(patch, into = c("id", "direction", "rep"), sep = "_", remove = FALSE) %>%
        # replace all alphabet characters with nothing
        mutate(id = as.numeric(str_match(id, "\\d+")),
            rep = as.numeric(rep),
            patch = as.numeric(patch)) %>%
        select(-patch) %>%
        group_by(id, rep) %>%
        pivot_wider(names_from = direction, values_from = value) %>%
        mutate(diff = UP - DN)
    
         id   rep    UP    DN  diff
      <dbl> <dbl> <dbl> <dbl> <dbl>
    1   134     1     6     5     1
    2   134     2     7     9    -2
    3   134     3     2    10    -8
    4    22     1     8     3     5
    5    22     3     4     1     3
    6    22     2    NA     9    NA
    

    For more information on cleaning and tidying dataframes read this.