Search code examples
rdplyrreplacedataset

Replace values based on ID's for determinate columns


I have this situation:

ID          n  post    date    el    a   b   c   d
100_left    4   50  10/11/2020  y   190 5.41 4  300
100_right   4   50  10/11/2020  n       5.4  5  200
101_left    4   50  10/11/2020  y   180 5.49 6  360
101_right   4   50  10/11/2020  n       5.48 6  180
102_left    4   50  10/11/2020  y   190 5.5  3  300
102_right   4   50  10/11/2020  n       5.46 5  200
103_left    4   50  10/11/2020  y   190 5.39 3  170
103_right   4   50  10/11/2020  n       5.44 3  360

I would like to use the same values of n_left for n_right (ID), but just for columns from a to d. Like this:

ID          n  post   date     el   a    b   c   d
100_left    4   50  10/11/2020  y   190 5.41 4  300
100_right   4   50  10/11/2020  n   190 5.41 4  300
101_left    4   50  10/11/2020  y   180 5.49 6  360
101_right   4   50  10/11/2020  n   180 5.49 6  360
102_left    4   50  10/11/2020  y   190 5.5  3  300
102_right   4   50  10/11/2020  n   190 5.5  3  300
103_left    4   50  10/11/2020  y   190 5.39 3  170
103_right   4   50  10/11/2020  n   190 5.39 3  170

Sorry for not putting codes but I don't know where to start. OBS: This is just a sample, my original dataset is much bigger.


Solution

  • We may group by the prefix part of 'ID' after removing the characters from _ with str_remove, then mutate across the columns 'a' to 'd', by selecting the values where 'el' is 'y'

    library(dplyr)
    library(stringr)
    df1 %>% 
      dplyr::group_by(grp = stringr::str_remove(ID, "_.*")) %>% 
      dplyr::mutate(across(a:d, ~ .[el == 'y'])) %>%
      ungroup %>% 
      dplyr::select(-grp)
    

    -output

    # A tibble: 8 × 9
      ID            n  post date       el        a     b     c     d
      <chr>     <int> <int> <chr>      <chr> <int> <dbl> <int> <int>
    1 100_left      4    50 10/11/2020 y       190  5.41     4   300
    2 100_right     4    50 10/11/2020 n       190  5.41     4   300
    3 101_left      4    50 10/11/2020 y       180  5.49     6   360
    4 101_right     4    50 10/11/2020 n       180  5.49     6   360
    5 102_left      4    50 10/11/2020 y       190  5.5      3   300
    6 102_right     4    50 10/11/2020 n       190  5.5      3   300
    7 103_left      4    50 10/11/2020 y       190  5.39     3   170
    8 103_right     4    50 10/11/2020 n       190  5.39     3   170
    

    data

    df1 <- structure(list(ID = c("100_left", "100_right", "101_left", "101_right", 
    "102_left", "102_right", "103_left", "103_right"), n = c(4L, 
    4L, 4L, 4L, 4L, 4L, 4L, 4L), post = c(50L, 50L, 50L, 50L, 50L, 
    50L, 50L, 50L), date = c("10/11/2020", "10/11/2020", "10/11/2020", 
    "10/11/2020", "10/11/2020", "10/11/2020", "10/11/2020", "10/11/2020"
    ), el = c("y", "n", "y", "n", "y", "n", "y", "n"), a = c(190L, 
    NA, 180L, NA, 190L, NA, 190L, NA), b = c(5.41, 5.4, 5.49, 5.48, 
    5.5, 5.46, 5.39, 5.44), c = c(4L, 5L, 6L, 6L, 3L, 5L, 3L, 3L), 
        d = c(300L, 200L, 360L, 180L, 300L, 200L, 170L, 360L)), 
    class = "data.frame", row.names = c(NA, 
    -8L))