Search code examples
rggplot2mapsggmap

Avoiding crazy lines while mapping in r


Hello I am trying to create a map of Europe in r that shows average gini scores. I'm pretty sure my data is all sorted out but I'm still getting a bunch of crazy lines and holes. Is it an error with how I'm making the map?

I made sure that all the country names lines up in the two data sets so I'm pretty sure the issue is with the mapping. I've tried changing the group aesthetic to subgroup but that only got rid of the gradiant colors. I have also tried restricting the x and y coordinates as suggested here Avoiding hoizontal lines and crazy shapes when plotting maps in ggplot2 but that did not solve the problem. I also tried using geom_polypath which helped but did not fix the problem entirely.

Here is the image generated: Crazy map

wiid_filtered <- wiid %>%
  filter(year >= 2000 & country != "Russia")

wiid_filtered <- wiid_filtered %>%
  filter(region_un == "Europe")

# Calculate the average Gini score for each country
wiid_avg <- wiid_filtered %>%
  group_by(country) %>%
  summarise(avg_gini = mean(gini, na.rm = TRUE))

# Load world map data
world_map <- map_data("world")

# Filter world map data for Europe
europe_map <- subset(world_map, region %in% c("Albania", "Andorra", "Austria", "Belarus", "Belgium", "Bosnia and Herzegovina",
                                                      "Bulgaria", "Croatia", "Czech Republic", "Denmark", "Estonia", "Finland", "France",
                                                      "Germany", "Greece", "Hungary", "Iceland", "Ireland", "Italy", "Kosovo", 
                                                      "Latvia", "Lithuania", "Luxembourg", "Malta", "Moldova", "Montenegro", 
                                                      "Netherlands", "North Macedonia", "Norway", "Poland", "Portugal", "Romania", 
                                                      "San Marino", "Serbia", "Serbia and Montenegro", "Slovakia", "Slovenia", 
                                                      "Spain", "Sweden", "Switzerland", "Ukraine", "UK"))

wiid_avg <- wiid_avg %>% 
  mutate(country = if_else(country == "United Kingdom", "UK", country))

wiid_avg <- wiid_avg %>% 
  mutate(country = if_else(country == "Czechia", "Czech Republic", country))


map_data <- merge(europe_map, wiid_avg, by.x = "region", by.y = "country", all.x = TRUE)

# Plot the choropleth map
ggplot(map_data, aes(x = long, y = lat, group = group, fill = avg_gini)) +
  geom_polypath(color = "black", size = 0.2) +
  coord_cartesian(xlim = c(-30, 50), ylim = c(30, 90)) +
  scale_fill_gradient(name = "Average Gini Score",
                      low = "lightblue", high = "darkblue",
                      guide = "legend") +
  labs(title = "Average Gini Score in Europe since 2000",
       x = "", y = "") +
  theme_minimal() +
  theme(legend.position = "bottom")

Solution

  • The problem you are encountering is due to how the ggplot2 map data are structured. I don't know anything about geom_polypath() but I do know that if you are working with spatial data, it is almost always easier to create them as simple feature (sf) objects.

    That is because ggplot2 has some very handy functions to manage sf objects such as geom_sf(), coord_sf() etc. Plotting sf objects using geom_sf() also has the advantage that the coordinates are automatically derived from the sf objects.

    To that end, here is how to solve your issue using the sf package. You haven't provided your GINI data, so if you have trouble joining your data, update your question with the relevant data and I'll update this answer.

    library(sf)
    library(dplyr)
    library(ggplot2)
    
    # Load world map data
    world_map <- map_data("world")
    
    # Filter world map data for Europe
    europe_map <- subset(world_map, region %in% c("Albania", "Andorra", "Austria", "Belarus", "Belgium", "Bosnia and Herzegovina",
                                                  "Bulgaria", "Croatia", "Czech Republic", "Denmark", "Estonia", "Finland", "France",
                                                  "Germany", "Greece", "Hungary", "Iceland", "Ireland", "Italy", "Kosovo", 
                                                  "Latvia", "Lithuania", "Luxembourg", "Malta", "Moldova", "Montenegro", 
                                                  "Netherlands", "North Macedonia", "Norway", "Poland", "Portugal", "Romania", 
                                                  "San Marino", "Serbia", "Serbia and Montenegro", "Slovakia", "Slovenia", 
                                                  "Spain", "Sweden", "Switzerland", "Ukraine", "UK"))
    
    # Define lon/lat, group data, combine by group, define as polygons, and set CRS
    europe_map <- europe_map %>%
      st_as_sf(coords = c("long","lat")) %>%
      group_by(region, group) %>%
      summarise(geometry = st_combine(geometry), .groups = "drop") %>% 
      st_cast("POLYGON") %>%
      st_set_crs(4326) %>%
      ungroup()
    
    ggplot(europe_map) +
      geom_sf(color = "black", linewidth = 0.2) +
      coord_sf(xlim = c(-30, 50), ylim = c(30, 90)) +
      # scale_fill_gradient(name = "Average Gini Score",
      #                     low = "lightblue", high = "darkblue",
      #                     guide = "legend") +
      labs(title = "Average Gini Score in Europe since 2000",
           x = "", y = "") +
      theme_minimal() +
      theme(legend.position = "bottom")
    

    result