Search code examples
regexstringrstringi

Extract character before and after "/"


I'm trying to extract character before and after "/" with no success. Sentences are:

XXXX YYY ZZZ - AV HAHEHRS, 3061 - SDDW ASDA DDSF - SAO JOSE DOS CAMPOS / SP - CEP: 00000-000

Output should be

SAO JOSE DOS CAMPOS / SP

I'm trying str_extract(str, "- [a-zA-Z]{1,} / [a-zA-Z]{1,}") but it's just bringing me

CAMPOS / SP

Solution

  • In your regex there is the space missing. Try:

    str_extract(str, "- [a-zA-Z ]+ / [a-zA-Z ]+") 
    

    Note the space in the character class. Also, {1,} is the long form of +.

    The match will be "- SAO JOSE DOS CAMPOS / SP - CEP". You must get rid of the - in a second step, or use a zero-width look-behind:

    str_extract(str, "(?<=- )[a-zA-Z ]+ / [a-zA-Z ]+") 
    

    Look-behinds are supported by gregexpr.


    For the sake of completeness, you could do this without regex: Split the input by '-', find the part that contains '/', trim. This might be faster than regex, too.