Search code examples
rdynamicconditional-statementssumifs

How to setup two dynamic conditions in SUMIFS like problem in R?


I already tried my best but am still pretty much a newbie to R.

Based on like 500mb of input data that currently looks like this:

TOTALLISTINGS
  listing_id calc.latitude calc.longitude reviews_last30days
1       2818       5829821       335511.0                  1
2      20168       5829746       335265.2                  3
3      25428       5830640       331534.6                  0
4      27886       5832156       332003.1                  3
5      28658       5830888       329727.2                  3
6      28871       5829980       332071.3                  7

I need to calculate the conditional sum of reviews_last30days - the conditions being a specific and changing area range for each respective record, i.e. R should sum only those reviews for which the calc.latitude and calc.longitude do not deviate more than +/-500 from the longitude and latitude values in each row.

EXAMPLE:

  • ROW 1 has a calc.latitude 5829821 and a calc.longitude 335511.0, so R should take the sum of all reviews_last30days for which the following ranges apply: calc.latitude 5829321‬ to 5830321‬ (value of Row 1 latitude +/-500) calc.longitude 335011.0 to 336011.0 (value of Row 1 longitude +/-500)

So my intended output would look somewhat like this in column 5:

TOTALLISTINGS
 listing_id calc.latitude calc.longitude reviews_last30days  reviewsper1000
1       2818       5829821       335511.0                  1             4
2      20168       5829746       335265.2                  3             4
3      25428       5830640       331534.6                  0            10
4      27886       5832156       332003.1                  3             3
5      28658       5830888       331727.2                  3            10
6      28871       5829980       332071.3                  7            10

Hope I calculated correctly in my head, but you get the idea..

Until now I particularly struggle with the fact that my sum conditions are dynamic and "newly assigned" since the latitude and longitude conditions have to be adjusted for each record.

My current code looks like this but it obviously doesn't work that way:

review1000 <- function(TOTALLISTINGS = NULL){
      # tibble to return
      to_return <- TOTALLISTINGS %>% 
        group_by(listing_id) %>% 
        summarise(
          reviews1000 = sum(reviews_last30days[(calc.latitude>=(calc.latitude-500) | calc.latitude<=(calc.latitude+500))]))
      return(to_return)
    }

REVIEWPERAREA <- review1000(TOTALLISTINGS)

I know I also would have to add something for longitude in the code above

Does anyone have an idea how to fix this? Any help or hints highly appreciated & thanks in advance! :)


Solution

  • See whether the below code will help.

    TOTALLISTINGS$reviews1000 <- sapply(1:nrow(TOTALLISTINGS), function(r) {
      currentLATI <- TOTALLISTINGS$calc.latitude[r]
      currentLONG <- TOTALLISTINGS$calc.longitude[r]
      sum(TOTALLISTINGS$reviews_last30days[between(TOTALLISTINGS$calc.latitude,currentLATI - 500, currentLATI + 500) & between(TOTALLISTINGS$calc.longitude,currentLONG - 500, currentLONG + 500)])
    })