Search code examples
rsplitstackshape

Splitting numerals from string in data frame


I have a data frame in R with a column that looks like this:

Venue
AAA 2001
BBB 2016
CCC 1996
... ....
ZZZ 2007

In order to make working with the dataframe slightly easier I wanted to split up the venue column into two columns, location and year, like so:

Location Year
AAA      2001
BBB      2016
CCC      1996
...      ....
ZZZ      2007

I have tried various variations of the cSplit() function to achieve this:

df = cSplit(df, "Venue", " ") #worked somewhat, however issues with places with multiple words (e.g. Los Angeles, Rio de Janeiro)
df = cSplit(df, "Venue", "[:digit:]")
df = cSplit(df, "Venue,", "[0-9]+")

None of these worked so far for me. I'd appreciate it if anyone could point me in the right direction.


Solution

  • How about this?

    d <- data.frame(Venue = c("AAA 2001", "BBB 2016", "CCC 1996", "cc d 2001"),
             stringsAsFactors = FALSE)
    
    d$Location <- gsub("[[:digit:]]", "", d$Venue)
    d$Year <- gsub("[^[:digit:]]", "", d$Venue)
    d
    #       Venue Location Year
    # 1  AAA 2001     AAA  2001
    # 2  BBB 2016     BBB  2016
    # 3  CCC 1996     CCC  1996
    # 4 cc d 2001    cc d  2001