I have a dataset of rental advertisements. For each advert I have its precise latitude (and longitude but let's take just latitude now for simplicity). For each latitude I have calculated boundaries approx. 2 kms north- and southwards (latmin and latmax):
advert_id locality_public_lat latmin latmax
1 10083631 49.5794 49.56143 49.59737
2 10247827 50.0546 50.03663 50.07257
3 10647798 50.0910 50.07303 50.10897
4 10584515 50.0906 50.07263 50.10857
I would like to know now how many of all the other adverts fall in these boundaries for each advert (take all values in locality_public_lat column and compare it with latmin and latmax in first row and yield a count for the first row; then repeat for all other rows).
So far I tried to aggregate the data using summary function but the problem is that it compares only latitudes within each row and therefore the result is not valid (just counts all rows).
summary <- data_prodej_3 %>%
group_by(advert_id, month_stats) %>%
summarise(pocet_inz = sum(locality_public_lat > latmin & locality_public_lat < latmax))
Is there a way to summarise the data the way I described?
Guess that you need to work by using a function and some loops.
With your data saved in a file named lat.txt:
dvert_id,locality_public_lat,latmin,latmax
10083631,49.5794,49.56143,49.59737
10247827,50.0546,50.03663,50.07257
10647798,50.0910,50.07303,50.10897
10584515,50.0906,50.07263,50.10857
We can use:
library (tidyverse)
#load the data
data_prodej_3 <- read.csv(file='lat.txt')
#save the dvert_id list
dvert_id_vector <- data_prodej_3[,c(1)]
#initialize the final recap table
recap_table <- data.frame(cbind(dvert_id=character(),locality_public_lat=numeric(),occurrences=integer()))
for (i in dvert_id_vector) {
counter <- 0
#save the locality_public_lat we're interested to compare
locality_public_lat_value <- data_prodej_3[data_prodej_3$dvert_id==i,c(2)]
#remove the dvert_id we're examining
data_prodej_3_filtered <- data_prodej_3[data_prodej_3$dvert_id!=i,]
for (j in seq(1,length(data_prodej_3_filtered)-1)) {
if (locality_public_lat_value > data_prodej_3_filtered[j,c(3)] & locality_public_lat_value < data_prodej_3_filtered[j,c(4)]) {
counter <- counter + 1
}
}
#save an entry in the recap table with the resulting count for the particular dvert_id
recap_table <- rbind(recap_table, c(i,locality_public_lat_value,counter))
}
colnames(recap_table) <- c('dvert_id','locality_public_lat','occurrences')
recap_table
To obtain:
dvert_id locality_public_lat occurrences
1 10083631 49.5794 0
2 10247827 50.0546 0
3 10647798 50.0910 1
4 10584515 50.0906 1