I am trying to build a mapping function with R, using the dplyr, sp and rgdal packages. To start with, I'll describe the data. I have downloaded the shapefile of the entire world from opendatasoft.com. I extracted the country names from the dataframe embedded in this sp object, and then made a dataframe with some values that are distributed across the countries. Finally, I wrote the following UDF to merge the data with the sp object.
createDataMap <- function(shapeFileName = "world-administrative-boundaries", countryValueData){
SpatialInformation <- readOGR(shapeFileName, paste(shapeFileName,"areas", sep = "-")) #I am appending the term "-areas" only because I renamed the files in the folder
ValueInformation <- countryValueData
namesOfCountriesInTheWorldMap <- SpatialInformation@data$name
namesOfCountriesInValueList <- ValueInformation$Country
namesInValuesButNotInTheWorld <- setdiff(namesOfCountriesInValueList, namesOfCountriesInTheWorldMap)
namesInTheWorldButNotInTheDataFile <- setdiff(namesOfCountriesInTheWorldMap, namesOfCountriesInValueList)
if(length(namesInValuesButNotInTheWorld)>0){
return("You seem to have Values outside this world. Please wait until we achieve Type 1 Civilization Status.")
}
if(length(namesInTheWorldButNotInTheDataFile)>0){
return("Please add all the country rows in the data file, even if you do not have Values everywhere.")
}
IDtoCountry <- as.data.frame(cbind(namesOfCountriesInTheWorldMap, sapply(SpatialInformation@polygons, function(x) x@ID)))
namesOfCountriesInValueList <- left_join(namesOfCountriesInValueList,IDtoCountry, by = c("Country" = "namesOfCountriesInTheWorldMap"))
locationInfo <- SpatialInformation@data
namesOfCountriesInValueList <- left_join(countryValueData,locationInfo, by = c("Country" = "name"))
rownames(namesOfCountriesInValueList) <- namesOfCountriesInValueList$V2
namesOfCountriesInValueList$V2 <- NULL
return(SpatialPolygonsDataFrame(SpatialInformation, namesOfCountriesInValueList, match.ID = TRUE))
}
As for the countryValueData, It is just a dataframe of random values and NAs associated with each countries, which I created with random number generation and the list of countries I got from the sp object.
This merge works fine. Now I have a dataframe embedded in the sp object with the following structure.
'data.frame': 256 obs. of 12 variables:
$ Country : chr "Uganda" "Uzbekistan" "Ireland" "Eritrea" ...
$ Value1 : num NA NA 1660 NA NA ...
$ Value2 : num 2727 734 734 2727 734 ...
$ Value3 : num 2574 4383 3024 4293 NA ...
$ Value4 : num 2727 734 734 2727 1404 ...
$ iso3 : chr "UGA" "UZB" "IRL" "ERI" ...
$ status : chr "Member State" "Member State" "Member State" "Member State" ...
$ color_code : chr "UGA" "UZB" "IRL" "ERI" ...
$ continent : chr "Africa" "Asia" "Europe" "Africa" ...
$ region : chr "Eastern Africa" "Central Asia" "Northern Europe" "Eastern Africa" ...
$ iso_3166_1_: chr "UG" "UZ" "IE" "ER" ...
$ french_shor: chr "Ouganda" "Ouzbékistan" "Irlande" "Érythrée" ...
where value1, value2, value3 and value4 are the random values I mentioned before. What I am now trying to do is to somehow aggregate this data on a continent level. In theory, since all the polygons in the sp object have their borders well defined, and there is a mapping from countries to continent in the data frame, aggregation should not impossible. I want to have a mechanism with which I can change the nature of my bubble plot from country level to continent level, but for that, first, I need to do this aggregation. Has anyone ever tried to do this?
With, e.g., {sf} it's straightforward to join spatial dataframes with supplementary data and "slice'n'dice" the non-geometry columns ("attributes") as one would do with other dataframes. When aggregating across rows, the spatial features are merged by default.
example:
library(sf)
the_countries <- read_sf('/path/to/my/shapefile.shp')
> the_countries
Simple feature collection with 9 features and 8 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -4.79028 ymin: 41.36492 xmax: 17.16639 ymax: 55.05653
Geodetic CRS: WGS 84
# A tibble: 9 x 9
iso3 status color_code name continent region iso_3166_1_ french_shor
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 CHE Member State CHE Switze~ Europe Weste~ CH Suisse
2 AUT Member State AUT Austria Europe Weste~ AT Autriche
## ...
country_value_data <- data.frame(iso3 = c('FRA', 'BEL'), random_indicator = rnorm(2))
library(dplyr)
the_countries |>
left_join(country_value_data, by = 'iso3') |>
group_by(continent) |>
summarise(random_indicator = mean(random_indicator, na.rm = TRUE))
+ Simple feature collection with 1 feature and 2 fields
Geometry type: MULTIPOLYGON
Dimension: XY
Bounding box: xmin: -4.79028 ymin: 41.36492 xmax: 17.16639 ymax: 55.05653
Geodetic CRS: WGS 84
# A tibble: 1 x 3
continent random_indicator geometry
<chr> <dbl> <MULTIPOLYGON [°]>
1 Europe -0.968 (((9.455 42.71861, 9.4675 42.76555, 9.4884 42.8070~