Search code examples
rmergegisr-sp

How to merge spatial data from two multi-polygon layers with identical geography, but different ID's?


So I have two shapefile layers of U.S. Census tracts with data I need to merge, but unfortunately one of the files uses a peculiar ID system that does not match up with the other file. I would like to merge these data into one layer, but I'm struggling to figure out how to do this in R (in QGIS, I can do this with the "Data Management Tools > Join Attributes by Location" function. Below is some code used to create two polygon layers, now I'd like to determine how to combine these two layers into one which retains all underlying data from both layers.

library(sp)

# Create FIRST polygon layer
square <- rbind(c(255842.4, 4111578, 255862.4, 4111578, 255862.4, 4111558, 
                  255842.4, 4111558, 255842.4, 4111578, 255842.4, 4111578),
                c(257397.0, 4111309, 257417.0, 4111309, 257417.0, 4111289, 
                  257397.0, 4111289, 257397.0, 4111309, 257397.0, 4111309))

ID <- c("C1", "C2")
people <- c(5000, 3000)

polys <- SpatialPolygons(list(
  Polygons(list(Polygon(matrix(square[1, ], ncol=2, byrow=TRUE))), ID[1]),
  Polygons(list(Polygon(matrix(square[2, ], ncol=2, byrow=TRUE))), ID[2])
))

# Create a dataframe and display default rownames
( p.df <- data.frame( ID=1:length(polys)) ) 
rownames(p.df)

# Try to coerce to SpatialPolygonsDataFrame (will throw error)
polys <- SpatialPolygonsDataFrame(polys, p.df) 

# Extract polygon ID's
( pid <- sapply(slot(polys, "polygons"), function(x) slot(x, "ID")) )

# Create dataframe with correct rownames
( p.df <- data.frame( ID=1:length(polys), row.names = pid) )    

# Try coersion again and check class
p <- SpatialPolygonsDataFrame(polys, p.df)
class(p) 

# Now we can add a column
p@data$people <- c(7500,3000) 


##################### NOW, creating the SECOND polygon layer
square2 <- rbind(c(255842.4, 4111578, 255862.4, 4111578, 255862.4, 4111558, 
                  255842.4, 4111558, 255842.4, 4111578, 255842.4, 4111578),
                c(257397.0, 4111309, 257417.0, 4111309, 257417.0, 4111289, 
                  257397.0, 4111289, 257397.0, 4111309, 257397.0, 4111309))

ID2 <- c("30067893", "30794385")

polys2 <- SpatialPolygons(list(
  Polygons(list(Polygon(matrix(square[1, ], ncol=2, byrow=TRUE))), ID2[1]),
  Polygons(list(Polygon(matrix(square[2, ], ncol=2, byrow=TRUE))), ID2[2])
))

# Create a dataframe and display default rownames
( p2.df <- data.frame( ID=1:length(polys2)) ) 
rownames(p2.df)

# Try to coerce to SpatialPolygonsDataFrame (will throw error)
polys2 <- SpatialPolygonsDataFrame(polys2, p2.df) 

# Extract polygon ID's
( pid2 <- sapply(slot(polys2, "polygons"), function(x) slot(x, "ID")) )

# Create dataframe with correct rownames
( p2.df <- data.frame( ID=1:length(polys2), row.names = pid2) )    

# Try coersion again and check class
p2 <- SpatialPolygonsDataFrame(polys2, p2.df)
class(p2) 

# Now we can add a column
p2@data$homes <- c(500,250) 

Now, I'd like to merge the data together to have one SpatialPolygonsDataFrame which has both "homes" and "people" as columns. This post is asking something similar, however, imagine that the ID column does not match up between the two layers. Anyone have a good notion for how I could go about doing that?


Solution

  • Let's first use a simpler way to create some example data. Here with "terra"

    library(terra)
    sq1a <- matrix(c(255842.4, 4111578, 255862.4, 4111578, 255862.4, 4111558, 
                  255842.4, 4111558, 255842.4, 4111578), ncol=2, byrow=TRUE)
    sq1b <- matrix(c(257397.0, 4111309, 257417.0, 4111309, 257417.0, 4111289, 
                  257397.0, 4111289, 257397.0, 4111309), ncol=2, byrow=TRUE)
    square <- rbind(cbind(1,1,sq1a,0), cbind(2,1,sq1b,0))
    p1 <- vect(square, type="polygons", atts=data.frame(ID=1:2, people=c(3000,7500)))
    
    p2 <- p1
    values(p2) <- data.frame(ID=1:2, homes=c(250,500))
    

    Since the IDs match, you can do

    m <- merge(p1, values(p2), by="ID")
    m
    # class       : SpatVector 
    # geometry    : polygons 
    # dimensions  : 2, 3  (geometries, attributes)
    # extent      : 255842.4, 257417, 4111289, 4111578  (xmin, xmax, ymin, ymax)
    # coord. ref. :  
    # names       :    ID people homes
    # type        : <int>  <num> <num>
    # values      :     1   3000   250
    #                   2   7500   500
    

    Now, we make the IDs non-matching

     values(p2) <- data.frame(ID=c("01","02"), homes=c(250,500))
    

    The obvious solution should be to fix the IDs, but if that is not possible you could try

    i <- intersect(p1, p2)
    i
    # class       : SpatVector 
    # geometry    : polygons 
    # dimensions  : 2, 4  (geometries, attributes)
    # extent      : 255842.4, 257417, 4111289, 4111578  (xmin, xmax, ymin, ymax)
    # coord. ref. :  
    # names       :    ID people    ID homes
    # type        : <int>  <num> <chr> <num>
    # values      :     1   3000    01   250
    #                   2   7500    02   500
    

    That works well for the example data, but the problem with this approach is that if the polygons are not exactly the same you can end up with a lot of cleaning work to get rid of small polygons that you do not want.

    Below is a computationally much cheaper solution that does not have this problem.

    e <- extract(p2, centroids(p1, inside=TRUE))
    p <- cbind(p1, e[,-1])
    p
    # class       : SpatVector 
    # geometry    : polygons 
    # dimensions  : 2, 4  (geometries, attributes)
    # extent      : 255842.4, 257417, 4111289, 4111578  (xmin, xmax, ymin, ymax)
    # coord. ref. :  
    # names       :    ID people  ID.1 homes
    # type        : <int>  <num> <chr> <num>
    # values      :     1   3000    01   250
    #                   2   7500    02   500
    

    If there is no a one-to-one relationship like in this example, then you would need to use merge instead of cbind

    x <- p1
    x$id.y <- e$id.y
    x <- merge(x, e, by="id.y")
    x$id.y <- NULL
    

    m is a SpatVector. You can coerce it to an sf object with sf::st_as_sf(m) or to a Spatial object with library(raster); as(m, "Spatial")