Search code examples
rdplyrapplysummarize

Calculations with dplyr based on specific factors and dates and summaries of values


I have a data frame of counts of different classifications of ship on specific dates at certain distances off shore (DOS), e.g. 0-12nm and 0-100nm - I would like to subtract the ships within the 0-12nm DOS from 0-100nm, so that I can calculate how many e.g. "passenger" ships were only in 12-100nm on each date. Once that is complete i would like to how many total passenger, cargo etc ships were counted within each DOS for the total time period... I can work out a really laborious ways to do this, but I am pretty sure with the mutate and summarize functions in dplyr there is a more efficient way to run this...

here is an dummy data frame:

df<- structure(list(date = structure(c(17622, 17623, 17624, 17625, 
17626, 17627, 17622, 17623, 17624, 17625, 17626, 17627), class = "Date"), 
    `Passenger(6X)` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), 
    `Cargo(7X)` = c(2, 0, 2, 2, 2, 3, 5, 4, 7, 6, 7, 4), `Tanker(8X)` = c(0, 
    0, 0, 0, 0, 0, 0, 3, 1, 0, 1, 0), Otherb = c(`5` = 0, `6` = 0, 
    `7` = 0, `8` = 0, `9` = 0, `10` = 0, `144` = 0, `154` = 0, 
    `164` = 0, `174` = 0, `184` = 0, `194` = 0), DOS = c("0-12nm", 
    "0-12nm", "0-12nm", "0-12nm", "0-12nm", "0-12nm", "0-100nm", 
    "0-100nm", "0-100nm", "0-100nm", "0-100nm", "0-100nm")), class = "data.frame", row.names = c(1L, 
2L, 3L, 4L, 5L, 6L, 1454L, 1455L, 1456L, 1457L, 1458L, 1459L))

In this example on the 1st of April 2018 cargo ships in 12-100nm should be 3 - the output could be in the form of new columns etc. ... within my real data set i actually have 4 different distances offshore and over a year of dates.... so I think dplyr is the best way to go for this - any help would be appreciated.


Solution

  • Option 1:

    df %>% 
           group_by(date) %>% 
           summarise_at(
              vars(`Cargo(7X)`, `Tanker(8x)`), 
              funs(.[DOS == '0-100nm'] - .[DOS == '0-12nm'])
        )
    
    #   date       `Cargo(7X)`  `Tanker(8x)`
    # 1 2018-04-01           3            0
    # 2 2018-04-02           4            3
    # 3 2018-04-03           5            1
    # 4 2018-04-04           4            0
    # 5 2018-04-05           5            1
    # 6 2018-04-06           1            0
    

    Option 2 :

    df %>% 
       group_by(date, DOS) %>% 
       summarise_at(vars(`Cargo(7X)`, `Tanker(8x)`), funs(sum)) %>% 
       gather(-(date:DOS), key = Ship, value = Value) %>%
       spread(key = DOS, value = Value) %>% 
       mutate('12-100nm' = `0-100nm`- `0-12nm`)
    
    #    date       Ship       `0-100nm` `0-12nm` `12-100nm`
    #  1 2018-04-01 Cargo(7X)          5        2          3
    #  2 2018-04-01 Tanker(8X)         0        0          0
    #  3 2018-04-02 Cargo(7X)          4        0          4
    #  4 2018-04-02 Tanker(8X)         3        0          3
    #  5 2018-04-03 Cargo(7X)          7        2          5
    #  6 2018-04-03 Tanker(8X)         1        0          1
    #  7 2018-04-04 Cargo(7X)          6        2          4
    #  8 2018-04-04 Tanker(8X)         0        0          0
    #  9 2018-04-05 Cargo(7X)          7        2          5
    # 10 2018-04-05 Tanker(8X)         1        0          1
    # 11 2018-04-06 Cargo(7X)          4        3          1
    # 12 2018-04-06 Tanker(8X)         0        0          0