Search code examples
rdata-cleaningtxt

How to load txt file that used indentations to mark observations into R


I am running analyses with data by county, and I would like to include variables with data from adjacent county. Before that, I need a file listing each county's adjacent counties.

From the census, I have such a txt file, but the format is... unique. While columns are tab delimited, each new source county is marked by indentation.

Example:

"Autauga County, AL"    01001   "Autauga County, AL"    01001
        "Chilton County, AL"    01021
        "Dallas County, AL" 01047
        "Elmore County, AL" 01051
        "Lowndes County, AL"    01085
        "Montgomery County, AL" 01101
"Baldwin County, AL"    01003   "Baldwin County, AL"    01003
        "Clarke County, AL" 01025
        "Escambia County, AL"   01053
        "Mobile County, AL" 01097
        "Monroe County, AL" 01099
        "Washington County, AL" 01129
        "Escambia County, FL"   12033  

I have no idea how to load this in. And there are too many counties in my study area to do it manually.

Would greatly appreciate any help!


Solution

  • If you go to the page describing the layout of the file - County Adjacency File Record Layout - it specifies that the file is tab delimited. So you can just use read_tsv. You can also use fill to get each main county associated with all of the adjacent counties.

        library(tidyverse)
    
        read_tsv("county_adjacency.txt", col_names = c("county", "geoid", "adj_county", "adj_geoid")) %>% 
           fill(county:geoid, .direction = "down")
    

    Result:

      county             geoid adj_county            adj_geoid
       <chr>              <chr> <chr>                 <chr>    
     1 Autauga County, AL 01001 Autauga County, AL    01001    
     2 Autauga County, AL 01001 Chilton County, AL    01021    
     3 Autauga County, AL 01001 Dallas County, AL     01047    
     4 Autauga County, AL 01001 Elmore County, AL     01051    
     5 Autauga County, AL 01001 Lowndes County, AL    01085    
     6 Autauga County, AL 01001 Montgomery County, AL 01101    
     7 Baldwin County, AL 01003 Baldwin County, AL    01003    
     8 Baldwin County, AL 01003 Clarke County, AL     01025    
     9 Baldwin County, AL 01003 Escambia County, AL   01053    
    10 Baldwin County, AL 01003 Mobile County, AL     01097   
    # … with 22,190 more rows