Search code examples
rsubstrgsub

Deleting specific characters in a data frame in R


I have data frame like following

>sample_df
dd_mav2_6541_0_10
dd_mav2_12567_0_2
dd_mav2_43_1_341
dd_mav2_19865_2_13
dd_mav2_1_0_1

I need to remove the all numbers after the foruth "_". I would like have the output like following

>sample_df
    dd_mav2_6541_0
    dd_mav2_12567_0
    dd_mav2_43_1
    dd_mav2_19865_2
    dd_mav2_1_0

I tried the following code but it only deletes specific number of characters but the not like the output as I mentioned above.

substr(sample_df,nchar(sample_df)-2,nchar(sample_df))

How can I get my output.


Solution

  • you can try this:

    gsub("_\\d+$","",sample_df)
    

    It will remove the underscore and any number (at least one) of digits that follows it, at the end of a string.

    With your data:

    sample_df <- c("dd_mav2_6541_0_10","dd_mav2_12567_0_2","dd_mav2_43_1_341","dd_mav2_19865_2_13","dd_mav2_1_0_1")
    
    gsub("_\\d+$","",sample_df)
    #[1] "dd_mav2_6541_0"  "dd_mav2_12567_0" "dd_mav2_43_1"    "dd_mav2_19865_2" "dd_mav2_1_0"