Search code examples
rregexsplitextractsapply

Extract just the number from string


How to extract just the number from the following dataframe.

last_run<-c('Last run 15 days ago','1st up after 126 days','Last run 21 days ago',
            'Last run 22 days ago','1st up after 177 days','1st up after 364 days')%>%
  as.data.frame()

The desired output is:

enter image description here

My attempt is:

new_df<-sapply(str_split(last_run$last_run," run"|"after"),'[',2)%>%
  as.data.frame()

Solution

  • sapply(strsplit(last_run, " "), function(x) na.omit(as.numeric(x)))
    

    strsplit

    It will parse last_run and returns a list where each element is a character vector with sentences split in words

    > strsplit(last_run, " ")
    [[1]]
    [1] "Last" "run"  "15"   "days" "ago" 
    
    [[2]]
    [1] "1st"   "up"    "after" "126"   "days" 
    
    [[3]]
    [1] "Last" "run"  "21"   "days" "ago" 
    
    [[4]]
    [1] "Last" "run"  "22"   "days" "ago" 
    
    [[5]]
    [1] "1st"   "up"    "after" "177"   "days" 
    
    [[6]]
    [1] "1st"   "up"    "after" "364"   "days" 
    

    as.numeric

    It will try to convert words in numbers and returns NA if it is not possible

    > as.numeric(strsplit(last_run, " ")[[1]])
    [1] NA NA 15 NA NA
    

    na.omit

    It will remove NA from vectors

    na.omit(as.numeric(strsplit(last_run, " ")[[1]]))[[1]]
    [1] 15
    

    na.omit returns a list, and the vector without NA is the first element of the list (that is why, you need [[1]])


    sapply

    sapply applies a function on each element of a list and returns a vector