Search code examples
rregexsqldf

extract value between specific string and colon in R


I have a table example like this

No, Memo
  1, Date: 2020/10/22 City: UA Note: True mastery of any skill takes a lifetime.
  2, Date: 2022/11/01 City: CH Note: Sweat is the lubricant of success.
  3, Date: 2022y11m1d City: UA Note: Every noble work is at first impossible.
  4, Date: 2022y2m15d City: AA Note: Live beautifully, dream passionately, love completely.

I want to extract string after Date: ,City: and Note:. For example at NO. 1,I need to extract the "2020/10/22" which is between Date: and City:, "UA" which is between City: and Note:, and the "True mastery of any skill takes a lifetime." which is after Note:.

Desired Output like :

 No Date       City Note
  1 2020/10/22 UA   True mastery of any skill takes a lifetime.
  2 2022/11/01 CH   Sweat is the lubricant of success.
  3 2022y11m1d UA   Every noble work is at first impossible.
  4 2022y2m15d AA   Live beautifully, dream passionately, love completely.

Does anyone know an answer for that?Any help would be great.Thank you.


Solution

  • My solution using regex and stringr and dplyr

    library(stringr)
    library(dplyr)
    
    df <- read.table(
      text = "No; Memo
      1; Date: 2020/10/22 City: UA Note: True mastery of any skill takes a lifetime.
      2; Date: 2022/11/01 City: CH Note: Sweat is the lubricant of success.
      3; Date: 2022y11m1d City: UA Note: Every noble work is at first impossible.
      4; Date: 2022y2m15d City: AA Note: Live beautifully, dream passionately, love completely.",
      sep = ";",
      header = T
    )
    
    df_test <- df %>% mutate(date = str_extract(Memo, "(?<=Date: )(.*)(?= City)"),
                             city = str_extract(Memo, "(?<=City: )(.*)(?= Note)"),
                             note = str_extract(Memo, "(?<=Note: ).*")) %>%
      select(-Memo)
    
    
    
    > df_test
      No       date city                                                   note
    1  1 2020/10/22   UA            True mastery of any skill takes a lifetime.
    2  2 2022/11/01   CH                     Sweat is the lubricant of success.
    3  3 2022y11m1d   UA               Every noble work is at first impossible.
    4  4 2022y2m15d   AA Live beautifully, dream passionately, love completely.
    

    The regex matches everything between the groups specified using positive lookahead and loohbehind.