Search code examples
rcsvggplot2spatialshapefile

How to map shapefile polygons to CSV data


I downloaded a polygon shape file (england_ct_1991.shp) from a zip file I downloaded here and CSV file (england_ct_1991.csv) from a zip file here that I want to connect together (all public data). I added a new column to the CSV file called 'price' so each county has a unique price like this:

name,label,x,y,price
Avon,9,360567.1834,171823.554,11
Avon,9,322865.922,160665.4829,11
Bedfordshire,10,506219.5005,242767.306,20
Berkshire,11,464403.02,172809.5331,23....

I joined the shp and CSV by the county name. The problem is the map is not superimposing on the price to show a nice color gradient on the counties based on the price. I checked some YouTube tutorials stating the important part is joining but it worked for them so I am unsure what I did wrong?

library(ggplot2)
library(sf)
library(tidyverse)

# map of england counties
map7 <- read_sf("england_ct_1991.shp")
head(map7)

ggplot(map7) +
geom_sf()

# get x (longitude) y (latitude) county names and prices
totalPrices <- read_csv("england_ct_1991.csv")
head(totalPrices)

# join map and csv data on county name
mappedData <- left_join(map7, totalCounts, by="name")
head(mappedData)

# print map
map1 <- ggplot(mappedData, aes( x=x, y=y, group=name)) +
   geom_polygon(aes(fill=price), color="black") +
   geom_sf()

map1

Solution

  • The key point is that the warning that Detected an unexpected many-to-many relationship between x and y when running left_join(map7, totalCounts, by="name").

    So keep your totalPrices data to be unique, that is, no duplicated regions in name column.

    library(ggplot2)
    library(sf)
    library(tidyverse)
    
    map7 <- read_sf("england_ct_1991.shp")
    
    totalPrices <- read_csv("england_ct_1991.csv")
    
    new <- totalPrices %>%
      group_by(name) %>%
      mutate(price = rnorm(1)) %>% 
      distinct(name, price)
    new
    
    ## A tibble: 47 × 2
    ## Groups:   name [47]
    #   name                           price
    #   <chr>                          <dbl>
    # 1 Lincolnshire                 -2.45  
    # 2 Cumbria                      -0.413 
    # 3 North Yorkshire               0.566 
    # 4 Northumberland                0.179 
    # 5 Cornwall and Isles of Scilly  1.05  
    # 6 Devon                        -0.493 
    # 7 Somerset                      0.324 
    # 8 Dorset                        0.704 
    # 9 East Sussex                   1.32  
    #10 Wiltshire                     0.0161
    ## ℹ 37 more rows
    ## ℹ Use `print(n = ...)` to see more rows
    
    
    map7 %>%
      left_join(new) %>%
      ggplot() +
      geom_sf(aes(fill = price),color="black")