rdataframelapply

data frame rows separated by semicolon, how do I subtract numbers and add to new column?


I have a data frame where my rows in one column have numbers separated by a semicolon. How do I subtract the first number from the last number and add that value to a new column? Say my data frame looks like this below:

df <- data.frame(x=c("1;2;3","1;1;1;1"),y=c(5,5))

How do I add a new column of the subtracted values from x? So the new data frame looks like:

df <- data.frame(x=c("1;2;3","1;1;1;1"),y=c(5,5),z=c(2,0))

Solution

  • Try this with stringr. The first regular expression takes the last number from the string and the second one takes the first number and then you can just subtract them.

    df <- data.frame(x=c("1;2;3","1;1;1:1"
                         ),y=c(5,5),z=c(2,0))
    
    library(stringr)
    df$eval <-   as.numeric(str_extract(df$x,"(\\d+$)")) -
                 as.numeric(str_extract(df$x,"[0-9]+"))