Search code examples
rtidyverseleft-joindata-wranglingmerging-data

Merging Dataset in R returns unexpected NA values


For this week tidytuesday challenge, I was trying to inspect the drought data. I wanted to make a spatial map to see the US counties' drought level. To this end, I tried to merge geospatial data from maps package with the existing data. Here is my code for that purpose

library(tidyverse)
library(tidycensus)
library(lubridate)
library(maps)

drought_fips <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-06-14/drought-fips.csv')
counties_geodata <- map_data("county")
data("fips_codes")

fips_codes$county <- gsub("County", "", as.character(fips_codes$county))
fips_codes$county <- str_to_lower(fips_codes$county)
fips_codes$state_name <- str_to_lower(fips_codes$state_name)

counties <- counties_geodata %>%
  left_join(fips_codes, by = c("region" = "state_name", "subregion" = "county"))

As can be seen from the code, after doing some data wrangling with the FIPS codes, I tried to merge this data with the US counties' geospatial data by using left_join function. The resulting counties data includes several NA values for state, state_code, county_code columns, which I really do not understand why this happens. Therefore, I would be glad if you can advise me on why I am receiving these NA values. Thank you for your attention.


Solution

  • Printing out the table in a tibble you can see that the county data has a space between within the quotations:

    head(tibble(fips_codes))

    Using fips_codes$county <- str_trim(fips_codes$county)

    Solves the issue. Now it should work fine. I'ts not necessary to add another line of code though you can change your gsub with:

    fips_codes$county <- gsub(".County", "", as.character(fips_codes$county))