Search code examples
rdplyrrlang

Inserting a column with values by searching within the same dataframe, and applying conditional statements, in R


I want to create a dataframe, named POI_gps:

  • Id(numeric)
  • lat (numeric)
  • long (numeric)
  • Timestamp (POSIXct)
  • 3min (POSIXct) (to be done by adding 60*3 to Timestamp)
  • isStay (True/False) (looking to create a boolean field)

There are about 100,000 observations inside the dataframe.

Sample of location_gps

Timestamp id lat long
2014-01-06 06:28:01 35 36.0762 24.8747
2014-01-06 06:28:01 35 36.0762 24.8746
2014-01-06 06:28:03 1 36.0661 24.8826
structure(list(Timestamp = structure(c(1388960881, 1388960881, 1388960883, 1388960885, 1388960886, 1388960887), tzone = "", class = c("POSIXct", "POSIXt")), id = c(35, 35, 35, 35, 35, 35), lat = c(36.0762, 36.0762, 36.0762, 36.0762, 36.0762, 36.0762), long = c(24.8747, 24.8746, 24.8744, 24.8743, 24.8742, 24.8741)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))

My idea is to check whether the same vehicle is at the same lat/long after 3mins. If it is, isStay would be TRUE, if it is not, isStay would be FALSE.

I have written a function for this:

searchGPS <- function(gpsTime, vehID, current.lat, current.long){
  x <- POI_gps %>%
    filter(id == vehID &
             Timestamp == (gpsTime + 60*3))
  
  ifelse(dim(x), return(FALSE),
         ifelse((x$lat[1] == current.lat & x$long[1] == current.long), return(TRUE), return(FALSE)))
  
}

I tried doing this but it doesn't work. I am new to R.

POI_gps <- locations_gps %>%
  group_by(id) %>%
  mutate("3min" = Timestamp + 60*3) %>%
  mutate("stay" = searchGPS("3min", id, lat, long))

This is my error:

Error: Problem with mutate() column stay. I stay = searchGPS("3min", id, lat, long). x Problem with filter() input ..1. I Input ..1 is id == vehID & Timestamp == (gpsTime + 60 * 5). x non-numeric argument to binary operator I The error occurred in group 1: id = 1. I The error occurred in group 1: id = 1.


Solution

  • If I understand what you are attempting to do correctly, you can just do:

    locations_gps |>
      mutate(min3 = Timestamp + 180) |> 
      group_by(id) |>
      mutate(stay = if_else(min3 <= lag(min3) & lat == lag(lat) & long == lag(long), TRUE, FALSE)) |> 
      ungroup()
    
    # A tibble: 6 x 6
      Timestamp              id   lat  long min3                stay 
      <dttm>              <dbl> <dbl> <dbl> <dttm>              <lgl>
    1 2014-01-05 14:28:01    35  36.1  24.9 2014-01-05 14:31:01 NA   
    2 2014-01-05 14:28:01    35  36.1  24.9 2014-01-05 14:31:01 FALSE
    3 2014-01-05 14:28:03    35  36.1  24.9 2014-01-05 14:31:03 FALSE
    4 2014-01-05 14:28:05    35  36.1  24.9 2014-01-05 14:31:05 FALSE
    5 2014-01-05 14:28:06    35  36.1  24.9 2014-01-05 14:31:06 FALSE
    6 2014-01-05 14:28:07    35  36.1  24.9 2014-01-05 14:31:07 FALSE
    

    Note that variable names cannot start with a number.