Search code examples
statisticsmapsspatialgeoeconomics

Calculate distance from a border with R


I need to calculate the distance between the border of a set of municipalities, from other municipalities in the state of Minas Gerais. The idea is to conduct a regression discontinuity design (RRD) and the border is my cut off.

Here is an example of database:

code latitude longitude munic                cerrado mantiqueira  mata
     <dbl> <chr>        <dbl> <chr>                  <dbl>       <dbl> <dbl>
 1 3170057 -196351    -421059 Ubaporanga                 0           0     1
 2 3170107 -197472    -479381 Uberaba                    1           0     0
 3 3170206 -189141    -482749 Uberlândia                 1           0     0
 4 3170404 -163592    -469022 Unaí                       1           0     0
 5 3170503 -203521     -42737 Urucânia                   0           0     0
 6 3170529 -161244    -457352 Urucuia                    0           0     0
 7 3170602 -203333    -463688 Vargem Bonita              0           0     0
 8 3170651 -153987    -423085 Vargem Grande do Ri~       0           0     0
 9 3170701 -215556    -454364 Varginha                   0           0     0
10 3170750 -183741    -460313 Varjão de Minas            1           0     0
11 3170800 -175944    -447226 Várzea da Palma            0           0     0
12 3171030 -155845    -436121 Verdelândia                0           0     0
13 3171071 -173974    -427307 Veredinha                  0           0     0
14 3171154 -200406    -422688 Vermelho Novo              0           0     1
15 3171303 -207559    -428742 Viçosa                     0           0     1
16 3171402 -20867     -422401 Vieiras                    0           0     1
17 3171709 -223264    -450965 Virgínia                   0           0     0
18 3171808 -188154    -427015 Virginópolis               0           0     0
19 3171907 -184738    -423067 Virgolândia                0           0     0
20 3172004 -210127    -428361 Visconde do Rio Bra~       0           0     0

You can download the full database here.

"cerrado", "mantiqueira" and "mata" are regions of Minas Gerais. 1 means that the municipality is inside the region. I also have data of latitude and longitude of each municipality. The ideia is:

1 - Create a border for each group of regions.

2 - Find a way to calculate the distance of municipalities that are inside and outside the group, from the border.

Here is an example of the strategy that I have in mind:

enter image description here

The package geobr is very popular in Brazil, when it comes to spatial analysis. However, I could not find a way to conduct the analysis that I have in mind.


Solution

  • You have too many NA values to do a true discontinuity design, like you showed above.

    Map.

    Because of that, the best thing to do might be to just calculate nearest neighbors by category:

    library(tidyverse)
    
    teste <- 
      readxl::read_xlsx("teste.xlsx") %>%
         rename(name_muni = munic,
                code_muni = code) %>%
          mutate(latitude = as.numeric(latitude),
                 longitude = as.numeric(longitude))
    
    cerrado <- filter(teste, cerrado == 1)
    other <- filter(teste, cerrado != 1)
    
    nn <- FNN::get.knnx(select(other, longitude, latitude), select(cerrado, longitude, latitude), k = 1)
    
    cerrado <- mutate(cerrado, dist_other = nn$nn.dist[, 1])
    

    Now for each cerrado municipality, you have its distance to another region; you can repeat this for the others and use rbind if you want to pull them back into one data frame.