I already tried my best but am still pretty much a newbie to R.
Based on like 500mb of input data that currently looks like this:
TOTALLISTINGS
listing_id calc.latitude calc.longitude reviews_last30days
1 2818 5829821 335511.0 1
2 20168 5829746 335265.2 3
3 25428 5830640 331534.6 0
4 27886 5832156 332003.1 3
5 28658 5830888 329727.2 3
6 28871 5829980 332071.3 7
I need to calculate the conditional sum of reviews_last30days
- the conditions being a specific and changing area range for each respective record, i.e. R should sum only those reviews for which the calc.latitude
and calc.longitude
do not deviate more than +/-500 from the longitude and latitude values in each row.
EXAMPLE:
calc.latitude
5829821 and a calc.longitude
335511.0, so R should take the sum of all reviews_last30days
for which the following ranges apply: calc.latitude
5829321 to 5830321 (value of Row 1 latitude +/-500)
calc.longitude
335011.0 to 336011.0 (value of Row 1 longitude +/-500)So my intended output would look somewhat like this in column 5:
TOTALLISTINGS
listing_id calc.latitude calc.longitude reviews_last30days reviewsper1000
1 2818 5829821 335511.0 1 4
2 20168 5829746 335265.2 3 4
3 25428 5830640 331534.6 0 10
4 27886 5832156 332003.1 3 3
5 28658 5830888 331727.2 3 10
6 28871 5829980 332071.3 7 10
Hope I calculated correctly in my head, but you get the idea..
Until now I particularly struggle with the fact that my sum conditions are dynamic and "newly assigned" since the latitude and longitude conditions have to be adjusted for each record.
My current code looks like this but it obviously doesn't work that way:
review1000 <- function(TOTALLISTINGS = NULL){
# tibble to return
to_return <- TOTALLISTINGS %>%
group_by(listing_id) %>%
summarise(
reviews1000 = sum(reviews_last30days[(calc.latitude>=(calc.latitude-500) | calc.latitude<=(calc.latitude+500))]))
return(to_return)
}
REVIEWPERAREA <- review1000(TOTALLISTINGS)
I know I also would have to add something for longitude in the code above
Does anyone have an idea how to fix this? Any help or hints highly appreciated & thanks in advance! :)
See whether the below code will help.
TOTALLISTINGS$reviews1000 <- sapply(1:nrow(TOTALLISTINGS), function(r) {
currentLATI <- TOTALLISTINGS$calc.latitude[r]
currentLONG <- TOTALLISTINGS$calc.longitude[r]
sum(TOTALLISTINGS$reviews_last30days[between(TOTALLISTINGS$calc.latitude,currentLATI - 500, currentLATI + 500) & between(TOTALLISTINGS$calc.longitude,currentLONG - 500, currentLONG + 500)])
})