Search code examples
rstringcharacter

In R I want to add everything after an underscore to a string in a previous row so that they each have the same string after said underscore


Say I have a column that has the following;

c("Q9NZ", "Q69Z_1036_1037_1_1_S1036", "A3K8", "P123_567_789_1_1_T567")

I want the output to find the underscore in the row beneath it and copy that plus the remainder of the string to the row that I am working with. Desired output is;

c("Q9NZ_1036_1037_1_1_S1036" , "Q69Z_1036_1037_1_1_S1036" , "A3K8_567_789_1_1_T567" ,"P123_567_789_1_1_T567")

Solution

  • Here's a dplyr+tidyr method

    data.frame(x) %>% 
      tidyr::separate_wider_delim(x, delim="_", names = c("prefix", "suffix"), too_many = "merge", too_few="align_start") %>% 
      tidyr::fill(suffix, .direction="up") %>% 
      transmute(value=paste(prefix, suffix, sep="_"))
    

    which returns

      value                   
      <chr>                   
    1 Q9NZ_1036_1037_1_1_S1036
    2 Q69Z_1036_1037_1_1_S1036
    3 A3K8_567_789_1_1_T567   
    4 P123_567_789_1_1_T567   
    

    so if you need the data in a data.frame, this might be helpful.