I need to calculate the distance between the border of a set of municipalities, from other municipalities in the state of Minas Gerais. The idea is to conduct a regression discontinuity design (RRD) and the border is my cut off.
Here is an example of database:
code latitude longitude munic cerrado mantiqueira mata
<dbl> <chr> <dbl> <chr> <dbl> <dbl> <dbl>
1 3170057 -196351 -421059 Ubaporanga 0 0 1
2 3170107 -197472 -479381 Uberaba 1 0 0
3 3170206 -189141 -482749 Uberlândia 1 0 0
4 3170404 -163592 -469022 Unaí 1 0 0
5 3170503 -203521 -42737 Urucânia 0 0 0
6 3170529 -161244 -457352 Urucuia 0 0 0
7 3170602 -203333 -463688 Vargem Bonita 0 0 0
8 3170651 -153987 -423085 Vargem Grande do Ri~ 0 0 0
9 3170701 -215556 -454364 Varginha 0 0 0
10 3170750 -183741 -460313 Varjão de Minas 1 0 0
11 3170800 -175944 -447226 Várzea da Palma 0 0 0
12 3171030 -155845 -436121 Verdelândia 0 0 0
13 3171071 -173974 -427307 Veredinha 0 0 0
14 3171154 -200406 -422688 Vermelho Novo 0 0 1
15 3171303 -207559 -428742 Viçosa 0 0 1
16 3171402 -20867 -422401 Vieiras 0 0 1
17 3171709 -223264 -450965 Virgínia 0 0 0
18 3171808 -188154 -427015 Virginópolis 0 0 0
19 3171907 -184738 -423067 Virgolândia 0 0 0
20 3172004 -210127 -428361 Visconde do Rio Bra~ 0 0 0
You can download the full database here.
"cerrado", "mantiqueira" and "mata" are regions of Minas Gerais. 1 means that the municipality is inside the region. I also have data of latitude and longitude of each municipality. The ideia is:
1 - Create a border for each group of regions.
2 - Find a way to calculate the distance of municipalities that are inside and outside the group, from the border.
Here is an example of the strategy that I have in mind:
The package geobr
is very popular in Brazil, when it comes to spatial analysis. However, I could not find a way to conduct the analysis that I have in mind.
You have too many NA values to do a true discontinuity design, like you showed above.
Because of that, the best thing to do might be to just calculate nearest neighbors by category:
library(tidyverse)
teste <-
readxl::read_xlsx("teste.xlsx") %>%
rename(name_muni = munic,
code_muni = code) %>%
mutate(latitude = as.numeric(latitude),
longitude = as.numeric(longitude))
cerrado <- filter(teste, cerrado == 1)
other <- filter(teste, cerrado != 1)
nn <- FNN::get.knnx(select(other, longitude, latitude), select(cerrado, longitude, latitude), k = 1)
cerrado <- mutate(cerrado, dist_other = nn$nn.dist[, 1])
Now for each cerrado municipality, you have its distance to another region; you can repeat this for the others and use rbind
if you want to pull them back into one data frame.