Search code examples
rtrimgsubtext-extraction

Trimming data in R


I have a data frame in R, in one of the columns data is like this

    "828/km (2,140/sq mi)"     "365/km (950/sq mi)"       "1,102/km (2,850/sq mi)"  
    "1,029/km (2,670/sq mi)"   "236/km (610/sq mi)"       "555/km (1,440/sq mi)"   

I want to trim this data so that it becomes like this

    828    365    1102    1029    236    555

How can I do it?

I have tried using gsub and grepl function with this code grepl("[0-9]+/km",Population.Density.a.), as.numeric(gsub("[^0-9]","",Population.Density.a.))
The code did not work

Tried str_extract function

population_density<-str_extract(states_data$Population.Density.a.,"[0-9]+/km")

Got this result

    "828/km" "365/km" "102/km" "029/km" "236/km" "555/km" "201/km" "319/km" "308/km" "303/km" 

should have been

    "828/km" "365/km" "1,102/km" "1,029/km" "236/km" "555/km" "201/km" "319/km" "308/km" "303/km" 

Tried

population_density<-str_extract(states_data$Population.Density.a.,"\\d+\\,\\d+")

Got this result

    "2,140"  NA       "1,102"  "1,029"  NA       "1,440"  NA       NA       NA       NA

Solution

  • You could do something like this...

    x <- c("828/km (2,140/sq mi)", "365/km (950/sq mi)", "1,102/km (2,850/sq mi)",
           "1,029/km (2,670/sq mi)", "236/km (610/sq mi)", "555/km (1,440/sq mi)")
    
    as.numeric(gsub("(\\d+).+", "\\1", gsub(",", "", x)))
    
    [1]  828  365 1102 1029  236  555
    

    It is easiest to remove the commas first, and then remove everything after the first string of digits.