Search code examples
rstringtidyversestr-replacegsub

Remove part of a string based on another column in R


I have a large dataset that looks like this. I want to remove a certain number of strings from the fruits columns indicated by the remove_strings column.

library(tidyverse)

df <- tibble(fruits=c("apple","banana","ananas"), 
             remove_strings=c(1,4,2))

df
#> # A tibble: 3 × 2
#>   fruits remove_strings
#>   <chr>           <dbl>
#> 1 apple               1
#> 2 banana              4
#> 3 ananas              2

Created on 2022-03-09 by the reprex package (v2.0.1)

From apple I want to remove the first string, from banana the first 4 and ananas the first 2. I want my data to look like this:


#>   fruits remove_strings   new_fruits
#>   <chr>           <dbl>
#> 1 apple               1      pple
#> 2 banana              4        na
#> 3 ananas              2       anas

Solution

  • Using substr:

    with(df, substr(fruits, remove_strings + 1, nchar(fruits)))
    # [1] "pple" "na"   "anas"
    

    Or, using str_sub:

    library(stringr)
    df %>% 
      mutate(removed = str_sub(fruits, remove_strings + 1))
    
    # A tibble: 3 x 3
      fruits remove_strings removed
      <chr>           <dbl> <chr>  
    1 apple               1 pple   
    2 banana              4 na     
    3 ananas              2 anas