I'm completely stumped and baffled trying to create a lagged variable in order to calculate the miles travelled between two observations.
> fuel_sheet %>% dplyr::select(vehicle, date, mileage)
+
# A tibble: 64 x 3
# Groups: vehicle [2]
vehicle date mileage
<fct> <date> <dbl>
1 VW T4 Campervan 2015-09-30 150631
2 VW T4 Campervan 2015-10-16 150866
3 VW T4 Campervan 2015-10-18 151395
4 VW T4 Campervan 2015-12-06 151731
5 VW T4 Campervan 2016-01-22 151968
6 VW T4 Campervan 2016-01-24 152360
7 VW T4 Campervan 2016-01-25 152731
8 VW T4 Campervan 2016-02-21 153066
9 VW T4 Campervan 2016-03-29 153500
10 VW T4 Campervan 2016-04-16 153751
# … with 54 more rows
> fuel_sheet <- fuel_sheet %>%
group_by(vehicle) %>%
mutate(miles_lag = lag(mileage, n=1, order_by=date)) %>%
mutate(miles = mileage - lag(mileage, n=1, order_by=date))
> fuel_sheet %>% dplyr::select(vehicle, date, mileage, miles_lag, miles)
# A tibble: 64 x 5
# Groups: vehicle [2]
vehicle date mileage miles_lag miles
<fct> <date> <dbl> <dbl> <dbl>
1 VW T4 Campervan 2015-09-30 150631 150631 0
2 VW T4 Campervan 2015-10-16 150866 150866 0
3 VW T4 Campervan 2015-10-18 151395 151395 0
4 VW T4 Campervan 2015-12-06 151731 151731 0
5 VW T4 Campervan 2016-01-22 151968 151968 0
6 VW T4 Campervan 2016-01-24 152360 152360 0
7 VW T4 Campervan 2016-01-25 152731 152731 0
8 VW T4 Campervan 2016-02-21 153066 153066 0
9 VW T4 Campervan 2016-03-29 153500 153500 0
10 VW T4 Campervan 2016-04-16 153751 153751 0
# … with 54 more rows
If I try using diff()
in place of lag to directly calculate the difference I get an array that is short by one element.
Found many threads here on SO on this sort of thing, one example is r - diff operation within a group, after a dplyr::group_by() - Stack Overflow, but can not see why my code isn't behaving as expected.
There is no conflict with plyr
as its not loaded and I can not see any other conflicts that would be affecting things here.
> sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Gentoo/Linux
Matrix products: default
BLAS: /usr/lib64/libblas.so.3.8.0
LAPACK: /usr/lib64/liblapack.so.3.8.0
locale:
[1] LC_CTYPE=en_GB.utf8 LC_NUMERIC=C LC_TIME=en_GB.utf8 LC_COLLATE=en_GB.utf8
[5] LC_MONETARY=en_GB.utf8 LC_MESSAGES=en_GB.utf8 LC_PAPER=en_GB.utf8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_GB.utf8 LC_IDENTIFICATION=C
attached base packages:
[1] graphics grDevices utils datasets stats methods base
other attached packages:
[1] reprex_0.3.0 xtable_1.8-4 orgutils_0.4-1 lubridate_1.7.9 googlesheets4_0.2.0
[6] chron_2.3-56 ascii_2.3 spelling_2.1 RSQLite_2.2.0 rmsfact_0.0.3
[11] rmarkdown_2.3 reticulate_1.16 magrittr_1.5 kableExtra_1.2.1 knitr_1.29
[16] Hmisc_4.4-1 Formula_1.2-3 survival_3.2-3 lattice_0.20-41 googlesheets_0.3.0
[21] foreign_0.8-80 fcuk_0.1.21 devtools_2.3.1 usethis_1.6.1 forcats_0.5.0
[26] stringr_1.4.0 dplyr_1.0.2 purrr_0.3.4 readr_1.3.1 tidyr_1.1.2
[31] tibble_3.0.3 ggplot2_3.3.2 tidyverse_1.3.0
loaded via a namespace (and not attached):
[1] googledrive_1.0.1 colorspace_1.4-1 ellipsis_0.3.1 rprojroot_1.3-2 htmlTable_2.0.1
[6] base64enc_0.1-3 fs_1.5.0 rstudioapi_0.11 remotes_2.2.0 bit64_4.0.5
[11] fansi_0.4.1 xml2_1.3.2 splines_4.0.2 pkgload_1.1.0 jsonlite_1.7.1
[16] broom_0.7.0 cluster_2.1.0 dbplyr_1.4.4 png_0.1-7 clipr_0.7.0
[21] compiler_4.0.2 httr_1.4.2 backports_1.1.9 assertthat_0.2.1 Matrix_1.2-18
[26] gargle_0.5.0 cli_2.0.2 later_1.1.0.1 htmltools_0.5.0 prettyunits_1.1.1
[31] tools_4.0.2 gtable_0.3.0 glue_1.4.2 Rcpp_1.0.5 cellranger_1.1.0
[36] vctrs_0.3.4 xfun_0.17 ps_1.3.4 testthat_2.3.2 rvest_0.3.6
[41] lifecycle_0.2.0 textutils_0.2-0 stringdist_0.9.6 scales_1.1.1 promises_1.1.1
[46] hms_0.5.3 parallel_4.0.2 RColorBrewer_1.1-2 curl_4.3 memoise_1.1.0
[51] gridExtra_2.3 rpart_4.1-15 latticeExtra_0.6-29 stringi_1.5.3 desc_1.2.0
[56] checkmate_2.0.0 pkgbuild_1.1.0 rlang_0.4.7 pkgconfig_2.0.3 evaluate_0.14
[61] htmlwidgets_1.5.1 bit_4.0.4 processx_3.4.4 tidyselect_1.1.0 R6_2.4.1
[66] generics_0.0.2 DBI_1.1.0 whisker_0.4 pillar_1.4.6 haven_2.3.1
[71] withr_2.2.0 nnet_7.3-14 modelr_0.1.8 crayon_1.3.4 utf8_1.1.4
[76] jpeg_0.1-8.1 grid_4.0.2 readxl_1.3.1 data.table_1.13.0 blob_1.2.1
[81] callr_3.4.4 digest_0.6.25 webshot_0.5.2 httpuv_1.5.4 openssl_1.4.2
[86] munsell_0.5.0 viridisLite_0.3.0 askpass_1.1 sessioninfo_1.1.1
> tidyverse_conflicts()
── Conflicts ───────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
✖ lubridate::as.difftime() masks base::as.difftime()
✖ lubridate::date() masks base::date()
✖ lubridate::days() masks chron::days()
✖ ascii::expand() masks tidyr::expand()
✖ magrittr::extract() masks tidyr::extract()
✖ stats::filter() masks dplyr::filter()
✖ kableExtra::group_rows() masks dplyr::group_rows()
✖ lubridate::hours() masks chron::hours()
✖ lubridate::intersect() masks base::intersect()
✖ stats::lag() masks dplyr::lag()
✖ lubridate::minutes() masks chron::minutes()
✖ lubridate::seconds() masks chron::seconds()
✖ magrittr::set_names() masks purrr::set_names()
✖ lubridate::setdiff() masks base::setdiff()
✖ Hmisc::src() masks dplyr::src()
✖ Hmisc::summarize() masks dplyr::summarize()
✖ lubridate::union() masks base::union()
✖ lubridate::years() masks chron::years()
Any suggestions or pointers would be gratefully received, and if I need to provide more information please let me know.
Thanks in advance.
EDIT : As requested dput()
of the data, I knew there was an easy way to dump out the data for others to use but couldn't for the life of me remember what it was, had been looking at the reprex
package to help without success...
> dput(fuel_sheet %>% dplyr::select(vehicle, date, mileage))
structure(list(vehicle = structure(c(3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L, 3L,
1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), .Label = c("Ford Focus",
"Honda Jazz", "VW T4 Campervan"), class = "factor"), date = structure(c(16708,
16724, 16726, 16775, 16822, 16824, 16825, 16852, 16889, 16907,
16914, 16915, 16915, 16973, 17009, 17011, 17015, 17025, 17047,
17092, 17117, 17117, 17144, 17146, 17173, 17237, 17284, 17295,
17319, 17333, 17361, 17423, 17434, 17453, 17466, 17558, 17579,
17634, 17635, 17641, 17642, 17710, 17743, 17807, 17807, 17929,
17995, 17999, 18004, 18042, 18074, 18077, 18106, 18129, 18139,
18223, 18227, 18337, 18469, 18470, 18493, 18501, 18505, 18509
), class = "Date"), mileage = c(150631, 150866, 151395, 151731,
151968, 152360, 152731, 153066, 153500, 153751, 154363, 154619,
154866, 155473, 156008, 156199, 156637, 156997, 157157, 61476,
61774, 157458, 62175, 62409, 63209, 157769, 158325, 158803, 158956,
159314, 159706, 160126, 160550, 160986, 161363, 162000, 162449,
162879, 163215, 163636, 164005, 164613, 164802, 165352, 165651,
165946, 166488, 166987, 167131, 167749, 168013, 168361, 168853,
169371, 169996, 170636, 170640, 170954, 171573, 171763, 172286,
172616, 173105, 173266)), row.names = c(NA, -64L), groups = structure(list(
vehicle = structure(c(1L, 3L), .Label = c("Ford Focus", "Honda Jazz",
"VW T4 Campervan"), class = "factor"), .rows = structure(list(
c(20L, 21L, 23L, 24L, 25L), c(1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L,
19L, 22L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L,
35L, 36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L,
46L, 47L, 48L, 49L, 50L, 51L, 52L, 53L, 54L, 55L, 56L,
57L, 58L, 59L, 60L, 61L, 62L, 63L, 64L)), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = 1:2, class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
EDIT 2 : Omitting order_by
doesn't help...
> fuel_sheet %>% group_by(vehicle) %>% mutate(miles = mileage - lag(mileage, n=1))
# A tibble: 64 x 10
# Groups: vehicle [2]
vehicle mileage litre gbp station date month year miles_lag miles
<fct> <dbl> <dbl> <dbl> <fct> <date> <dbl> <dbl> <dbl> <dbl>
1 VW T4 Campervan 150631 78.0 81.9 Tescos 2015-09-30 9 2015 150631 0
2 VW T4 Campervan 150866 33.6 36.9 Nisa 2015-10-16 10 2015 150866 0
3 VW T4 Campervan 151395 66 72.5 Tescos 2015-10-18 10 2015 151395 0
4 VW T4 Campervan 151731 44 44.4 Morrisons 2015-12-06 12 2015 151731 0
5 VW T4 Campervan 151968 34.1 34 Other 2016-01-22 1 2016 151968 0
6 VW T4 Campervan 152360 49.1 48.1 Morrisons 2016-01-24 1 2016 152360 0
7 VW T4 Campervan 152731 53.2 52.0 Tescos 2016-01-25 1 2016 152731 0
8 VW T4 Campervan 153066 44.0 43 Tescos 2016-02-21 2 2016 153066 0
9 VW T4 Campervan 153500 56.9 58 Tescos 2016-03-29 3 2016 153500 0
10 VW T4 Campervan 153751 27.6 29 Tescos 2016-04-16 4 2016 153751 0
# … with 54 more rows
In the conflicts you have
stats::lag() masks dplyr::lag()
If you specify dplyr::lag()
inside both mutate
's it should work. (I checked that specifying stats::lag()
I get the same output than yours.)