Search code examples
rstringstring-matchingtext-extraction

Is there any way to do partial String matching in R?


I have 2 data frames. First has more number of rows and one ID column "ALP23456" and other related columns. Second has lesser number of rows and the ID value is present as a comment "ALP23456 done on 26th March". This is a free text and no pattern decipherable.

Problem: I want to reference ID column from Data Frame 1 into Data Frame 2 Text column to get some information from Data frame 2. Facing issue as it is not an exact match.

enter image description here

enter image description here

Solution I want: enter image description here


Solution

  • I have used regular expression and merging of the two dataframes as shown below:

    library(stringr)
    library(dplyr)
    
    df2$ID <- str_trim(str_extract(df2$Text, pattern = "Q\\S*|A\\S*"))
    df <- left_join(df1, df2, by = "ID")