Search code examples
regexrabbreviation

R: Abbreviate state names in strings


I have strings with state names in them. How do I efficiently abbreviate them? I am aware of state.abb[grep("New York", state.name)] but this works only if "New York" is the whole string. I have, for example, "Walmart, New York". Thanks in advance!

Let's assume this input:

x = c("Walmart, New York", "Hobby Lobby (California)", "Sold in Sears in Illinois")

Edit: desired outputs will be a la "Walmart, NY", "Hobby Lobby (CA)", "Sold in Sears in IL". As you can see from here, state can appear in many ways in a string


Solution

  • Here's a base R way, using gregexpr(), regmatches(), and regmatches<-(), :

    abbreviateStateNames <- function(x) {
        pat <- paste(state.name, collapse="|")
        m <- gregexpr(pat, x)
        ff <- function(x) state.abb[match(x, state.name)]
        regmatches(x, m) <- lapply(regmatches(x, m), ff)
        x
    }
    
    x <- c("Hobby Lobby (California)", 
           "Hello New York City, here I come (from Greensboro North Carolina)!")
    
    abbreviateStateNames(x)
    # [1] "Hobby Lobby (CA)"                                
    # [2] "Hello NY City, here I come (from Greensboro NC)!"
    

    Alternatively -- and quite a bit more naturally -- you can accomplish the same thing using the gsubfn package:

    library(gsubfn)
    
    pat <- paste(state.name, collapse="|")
    gsubfn(pat, function(x) state.abb[match(x, state.name)], x)
    [1] "Hobby Lobby (CA)"                                
    [2] "Hello NY City, here I come (from Greensboro NC)!"