Search code examples
rrow

Creating new row values based on second column in R


I want to create a new variable called "X" whic is the sum of "B" and "D"

type <- c( "A", "B","C","D","E")
cnt <- c(2,5,3,7,8)

df <- data.frame(type,cnt)

> df
  type cnt
1    A   2
2    B   5
3    C   3
4    D   7
5    E   8

The desired output is

> df
  type cnt
1    A   2
2    B   5
3    C   3
4    D   7
5    E   8
6    X  12

How could extend this, if we add another grouping variable like date. Would like to add up X for each day

 date <- c("2022-01-01","2022-01-01","2022-01-01","2022-01-01","2022-01-01","2022-01-02","2022-01-02","2022-01-02","2022-01-02","2022-01-02")
type <- c("A", "B","C","D","E","A", "B","C","D","E")
cnt <- c(2,5,3,7,8, 1,9,8,2,5)

df <- data.frame(date,type,cnt)

df
         date type cnt
1  2022-01-01    A   2
2  2022-01-01    B   5
3  2022-01-01    C   3
4  2022-01-01    D   7
5  2022-01-01    E   8
6  2022-01-02    A   1
7  2022-01-02    B   9
8  2022-01-02    C   8
9  2022-01-02    D   2
10 2022-01-02    E   5

Desired output is

df
         date type cnt
1  2022-01-01    A   2
2  2022-01-01    B   5
3  2022-01-01    C   3
4  2022-01-01    D   7
5  2022-01-01    E   8
6  2022-01-01    X  12
7  2022-01-02    A   1
8  2022-01-02    B   9
9  2022-01-02    C   8
10 2022-01-02    D   2
11 2022-01-02    E   5
12 2022-01-02    X   11

Solution

  • We can subset and rbind

    rbind(df, data.frame(type = "X", cnt = sum(df$cnt[df$type %in% c("B", "D")])))
    

    -output

     type cnt
    1    A   2
    2    B   5
    3    C   3
    4    D   7
    5    E   8
    6    X  12
    

    Or in dplyr, filter the rows based on the 'type' values, summarise by taking the sum of 'cnt', while creating 'type' as 'X' and use bind_rows with original dataset

    library(dplyr)
    df %>% 
      filter(type %in% c("B", "D")) %>% 
      summarise(type = 'X', cnt = sum(cnt)) %>%
      bind_rows(df, .)
    

    Or without using bind_rows

    df %>% 
       summarise(type = c(type, 'X'), cnt = c(cnt, sum(cnt[type %in% c("B", "D")])))
      type cnt
    1    A   2
    2    B   5
    3    C   3
    4    D   7
    5    E   8
    6    X  12
    

    Or using complete

    library(tidyr)
    complete(df, type = c(type, "X"), fill = list(cnt = sum(cnt[type %in% c("B", "D")])))
    # A tibble: 6 × 2
      type    cnt
      <chr> <dbl>
    1 A         2
    2 B         5
    3 C         3
    4 D         7
    5 E         8
    6 X        12
    

    Update

    For the updated data, just add a group_by

    df %>% 
      group_by(date) %>%
      summarise(type = c(type, "X"), 
        cnt = c(cnt, sum(cnt[type %in% c("B", "D")])), .groups = 'drop')
    

    -output

    # A tibble: 12 × 3
       date       type    cnt
       <chr>      <chr> <dbl>
     1 2022-01-01 A         2
     2 2022-01-01 B         5
     3 2022-01-01 C         3
     4 2022-01-01 D         7
     5 2022-01-01 E         8
     6 2022-01-01 X        12
     7 2022-01-02 A         1
     8 2022-01-02 B         9
     9 2022-01-02 C         8
    10 2022-01-02 D         2
    11 2022-01-02 E         5
    12 2022-01-02 X        11
    

    Or using the filter approach

    df %>%
       filter(type %in% c("B", "D")) %>% 
       group_by(date) %>% 
       summarise(type = 'X', cnt = sum(cnt), .groups = 'drop') %>% 
       bind_rows(df, .) %>% 
       arrange(date)