Search code examples
rgeospatialspatialr-sp

R Spatial Information Aggregation


I am trying to build a mapping function with R, using the dplyr, sp and rgdal packages. To start with, I'll describe the data. I have downloaded the shapefile of the entire world from opendatasoft.com. I extracted the country names from the dataframe embedded in this sp object, and then made a dataframe with some values that are distributed across the countries. Finally, I wrote the following UDF to merge the data with the sp object.

createDataMap <- function(shapeFileName = "world-administrative-boundaries", countryValueData){
  SpatialInformation <- readOGR(shapeFileName, paste(shapeFileName,"areas", sep = "-")) #I am appending the term "-areas" only because I renamed the files in the folder
  ValueInformation <- countryValueData
  
  namesOfCountriesInTheWorldMap <- SpatialInformation@data$name
  namesOfCountriesInValueList <- ValueInformation$Country
  namesInValuesButNotInTheWorld <- setdiff(namesOfCountriesInValueList, namesOfCountriesInTheWorldMap)
  namesInTheWorldButNotInTheDataFile <- setdiff(namesOfCountriesInTheWorldMap, namesOfCountriesInValueList)
  
  if(length(namesInValuesButNotInTheWorld)>0){
    return("You seem to have Values outside this world. Please wait until we achieve Type 1 Civilization Status.")
  }
  
  
  
  if(length(namesInTheWorldButNotInTheDataFile)>0){
    return("Please add all the country rows in the data file, even if you do not have Values everywhere.")
  }
  
  IDtoCountry <- as.data.frame(cbind(namesOfCountriesInTheWorldMap, sapply(SpatialInformation@polygons, function(x) x@ID)))
  
  namesOfCountriesInValueList <- left_join(namesOfCountriesInValueList,IDtoCountry, by = c("Country" = "namesOfCountriesInTheWorldMap"))
  locationInfo <- SpatialInformation@data
  namesOfCountriesInValueList <- left_join(countryValueData,locationInfo, by = c("Country" = "name"))
  rownames(namesOfCountriesInValueList) <- namesOfCountriesInValueList$V2
  
  namesOfCountriesInValueList$V2 <- NULL
  
  return(SpatialPolygonsDataFrame(SpatialInformation, namesOfCountriesInValueList, match.ID = TRUE))
  
   
}

As for the countryValueData, It is just a dataframe of random values and NAs associated with each countries, which I created with random number generation and the list of countries I got from the sp object.

This merge works fine. Now I have a dataframe embedded in the sp object with the following structure.

'data.frame': 256 obs. of 12 variables:

$ Country : chr "Uganda" "Uzbekistan" "Ireland" "Eritrea" ...

$ Value1 : num NA NA 1660 NA NA ...

$ Value2 : num 2727 734 734 2727 734 ...

$ Value3 : num 2574 4383 3024 4293 NA ...

$ Value4 : num 2727 734 734 2727 1404 ...

$ iso3 : chr "UGA" "UZB" "IRL" "ERI" ...

$ status : chr "Member State" "Member State" "Member State" "Member State" ...

$ color_code : chr "UGA" "UZB" "IRL" "ERI" ...

$ continent : chr "Africa" "Asia" "Europe" "Africa" ...

$ region : chr "Eastern Africa" "Central Asia" "Northern Europe" "Eastern Africa" ...

$ iso_3166_1_: chr "UG" "UZ" "IE" "ER" ...

$ french_shor: chr "Ouganda" "Ouzbékistan" "Irlande" "Érythrée" ...

where value1, value2, value3 and value4 are the random values I mentioned before. What I am now trying to do is to somehow aggregate this data on a continent level. In theory, since all the polygons in the sp object have their borders well defined, and there is a mapping from countries to continent in the data frame, aggregation should not impossible. I want to have a mechanism with which I can change the nature of my bubble plot from country level to continent level, but for that, first, I need to do this aggregation. Has anyone ever tried to do this?


Solution

  • With, e.g., {sf} it's straightforward to join spatial dataframes with supplementary data and "slice'n'dice" the non-geometry columns ("attributes") as one would do with other dataframes. When aggregating across rows, the spatial features are merged by default.

    example:

    • read in shapefile as spatial dataframe:
    library(sf)
    
    the_countries <- read_sf('/path/to/my/shapefile.shp')
    
    > the_countries
    Simple feature collection with 9 features and 8 fields
    Geometry type: MULTIPOLYGON
    Dimension:     XY
    Bounding box:  xmin: -4.79028 ymin: 41.36492 xmax: 17.16639 ymax: 55.05653
    Geodetic CRS:  WGS 84
    # A tibble: 9 x 9
      iso3  status       color_code name    continent region iso_3166_1_ french_shor
      <chr> <chr>        <chr>      <chr>   <chr>     <chr>  <chr>       <chr>      
    1 CHE   Member State CHE        Switze~ Europe    Weste~ CH          Suisse     
    2 AUT   Member State AUT        Austria Europe    Weste~ AT          Autriche
    ## ...
    
    • generate some additional toy data:
    country_value_data <- data.frame(iso3 = c('FRA', 'BEL'), random_indicator = rnorm(2))
    
    • join toy data to existing attributes and aggregate, e. g. across continent:
    library(dplyr)
    
    the_countries |>
      left_join(country_value_data, by = 'iso3') |>
      group_by(continent) |>
      summarise(random_indicator = mean(random_indicator, na.rm = TRUE))
    
    + Simple feature collection with 1 feature and 2 fields
    Geometry type: MULTIPOLYGON
    Dimension:     XY
    Bounding box:  xmin: -4.79028 ymin: 41.36492 xmax: 17.16639 ymax: 55.05653
    Geodetic CRS:  WGS 84
    # A tibble: 1 x 3
      continent random_indicator                                            geometry
      <chr>                <dbl>                                  <MULTIPOLYGON [°]>
    1 Europe              -0.968 (((9.455 42.71861, 9.4675 42.76555, 9.4884 42.8070~