Search code examples
rstringsubstringquotations

R - return string within first set of quotation marks


SO I have a dataframe made up of thousands of records that I have imported from .csv. One variable within the dataframe is a free text field dervied from a lexicon. The rows of data are in the below format.

Please note that the below are not vectors but rows of char data within a variable 'date' (they just happen to look exactly like a vector):

c("9th november 2018", "27th october 2018"),

c("three months", "6 months"),

c("24th december ", "2th january 2019", "25th january 2019")

essentially all that I am interested in doing is taking the string from the first set of quotation marks and removing the rest, so:

c("9th november 2018", "27th october 2018") 
9th november 2018

I am using the following code but it is taking the string from the last set of quotation marks:

LexiDate3$finaldat3 <- sub('.*,"*(.*?) *" *', '\\1', LexiDate3$Date_new)

which returns:

27th october 2018")

Not ideal and for the life of me cant figure this one out. Any help would be greatly appreciated guys.

Thanks.


Solution

  • How does this look? Note the quotes around the output are put there by the print method and not embedded in the string.

    library(stringr)
    test <- 'c("9th november 2018", "27th october 2018"),'
    str_extract(test,'(?<=")(.*?)(?=")')
    #> [1] "9th november 2018"
    Created on 2019-02-21 by the reprex package (v0.2.1)