Search code examples
rgeolocation

R Compare latitudes and summarise results for each row


I have a dataset of rental advertisements. For each advert I have its precise latitude (and longitude but let's take just latitude now for simplicity). For each latitude I have calculated boundaries approx. 2 kms north- and southwards (latmin and latmax):

  advert_id locality_public_lat   latmin   latmax
1  10083631             49.5794 49.56143 49.59737
2  10247827             50.0546 50.03663 50.07257
3  10647798             50.0910 50.07303 50.10897
4  10584515             50.0906 50.07263 50.10857

I would like to know now how many of all the other adverts fall in these boundaries for each advert (take all values in locality_public_lat column and compare it with latmin and latmax in first row and yield a count for the first row; then repeat for all other rows).

So far I tried to aggregate the data using summary function but the problem is that it compares only latitudes within each row and therefore the result is not valid (just counts all rows).

summary <- data_prodej_3 %>%
  group_by(advert_id, month_stats) %>%
  summarise(pocet_inz = sum(locality_public_lat > latmin & locality_public_lat < latmax))

Is there a way to summarise the data the way I described?


Solution

  • Guess that you need to work by using a function and some loops.

    With your data saved in a file named lat.txt:

    dvert_id,locality_public_lat,latmin,latmax
    10083631,49.5794,49.56143,49.59737
    10247827,50.0546,50.03663,50.07257
    10647798,50.0910,50.07303,50.10897
    10584515,50.0906,50.07263,50.10857
    

    We can use:

    library (tidyverse)
    
    #load the data
    data_prodej_3 <- read.csv(file='lat.txt')
    #save the dvert_id list
    dvert_id_vector  <- data_prodej_3[,c(1)]
    #initialize the final recap table
    recap_table <- data.frame(cbind(dvert_id=character(),locality_public_lat=numeric(),occurrences=integer()))
    
    for (i in dvert_id_vector) {
      
      counter <- 0
        
      #save the locality_public_lat we're interested to compare
      locality_public_lat_value  <- data_prodej_3[data_prodej_3$dvert_id==i,c(2)]
      
      #remove the dvert_id we're examining
      data_prodej_3_filtered <- data_prodej_3[data_prodej_3$dvert_id!=i,]
      
      for (j in seq(1,length(data_prodej_3_filtered)-1)) {
          
        if (locality_public_lat_value > data_prodej_3_filtered[j,c(3)] & locality_public_lat_value < data_prodej_3_filtered[j,c(4)]) {
          counter <- counter + 1
        }
      
      }
      
      #save an entry in the recap table with the resulting count for the particular dvert_id
      recap_table <- rbind(recap_table, c(i,locality_public_lat_value,counter))
      
    }
    
    colnames(recap_table) <- c('dvert_id','locality_public_lat','occurrences')
    recap_table
    

    To obtain:

      dvert_id locality_public_lat occurrences
    1 10083631             49.5794           0
    2 10247827             50.0546           0
    3 10647798             50.0910           1
    4 10584515             50.0906           1