I have a data frame of counts of different classifications of ship on specific dates at certain distances off shore (DOS), e.g. 0-12nm and 0-100nm - I would like to subtract the ships within the 0-12nm DOS from 0-100nm, so that I can calculate how many e.g. "passenger" ships were only in 12-100nm on each date. Once that is complete i would like to how many total passenger, cargo etc ships were counted within each DOS for the total time period... I can work out a really laborious ways to do this, but I am pretty sure with the mutate and summarize functions in dplyr there is a more efficient way to run this...
here is an dummy data frame:
df<- structure(list(date = structure(c(17622, 17623, 17624, 17625,
17626, 17627, 17622, 17623, 17624, 17625, 17626, 17627), class = "Date"),
`Passenger(6X)` = c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0),
`Cargo(7X)` = c(2, 0, 2, 2, 2, 3, 5, 4, 7, 6, 7, 4), `Tanker(8X)` = c(0,
0, 0, 0, 0, 0, 0, 3, 1, 0, 1, 0), Otherb = c(`5` = 0, `6` = 0,
`7` = 0, `8` = 0, `9` = 0, `10` = 0, `144` = 0, `154` = 0,
`164` = 0, `174` = 0, `184` = 0, `194` = 0), DOS = c("0-12nm",
"0-12nm", "0-12nm", "0-12nm", "0-12nm", "0-12nm", "0-100nm",
"0-100nm", "0-100nm", "0-100nm", "0-100nm", "0-100nm")), class = "data.frame", row.names = c(1L,
2L, 3L, 4L, 5L, 6L, 1454L, 1455L, 1456L, 1457L, 1458L, 1459L))
In this example on the 1st of April 2018 cargo ships in 12-100nm should be 3 - the output could be in the form of new columns etc. ... within my real data set i actually have 4 different distances offshore and over a year of dates.... so I think dplyr is the best way to go for this - any help would be appreciated.
Option 1:
df %>%
group_by(date) %>%
summarise_at(
vars(`Cargo(7X)`, `Tanker(8x)`),
funs(.[DOS == '0-100nm'] - .[DOS == '0-12nm'])
)
# date `Cargo(7X)` `Tanker(8x)`
# 1 2018-04-01 3 0
# 2 2018-04-02 4 3
# 3 2018-04-03 5 1
# 4 2018-04-04 4 0
# 5 2018-04-05 5 1
# 6 2018-04-06 1 0
Option 2 :
df %>%
group_by(date, DOS) %>%
summarise_at(vars(`Cargo(7X)`, `Tanker(8x)`), funs(sum)) %>%
gather(-(date:DOS), key = Ship, value = Value) %>%
spread(key = DOS, value = Value) %>%
mutate('12-100nm' = `0-100nm`- `0-12nm`)
# date Ship `0-100nm` `0-12nm` `12-100nm`
# 1 2018-04-01 Cargo(7X) 5 2 3
# 2 2018-04-01 Tanker(8X) 0 0 0
# 3 2018-04-02 Cargo(7X) 4 0 4
# 4 2018-04-02 Tanker(8X) 3 0 3
# 5 2018-04-03 Cargo(7X) 7 2 5
# 6 2018-04-03 Tanker(8X) 1 0 1
# 7 2018-04-04 Cargo(7X) 6 2 4
# 8 2018-04-04 Tanker(8X) 0 0 0
# 9 2018-04-05 Cargo(7X) 7 2 5
# 10 2018-04-05 Tanker(8X) 1 0 1
# 11 2018-04-06 Cargo(7X) 4 3 1
# 12 2018-04-06 Tanker(8X) 0 0 0