Search code examples
regexrpcreicu

Regex (ICU) for matching between parentheses


Looking for some regex which will create a capture group for words occurring within parentheses, ignoring the parentheses themselves. The regex must be either PCRE or ICU.

Input: ( lakshd asd___ asa1123 Name : _____)

Desired Output: Name

What I've tried:

\\((Name|name|NAME)\\)

(?<=\\()name|Name|NAME(?=\\))

\\(name|Name|NAME\\)


Solution

  • What I've tried:

    \\((Name|name|NAME)\\)
    (?<=\\()name|Name|NAME(?=\\))
    \\(name|Name|NAME\\)

    All these patterns look for name or Name or NAME that has a ( immediately before and ) right after, with difference being what is captured or returned as a match. To match some word inside parentheses, you need to use \([^()]* before the value you need to get, and [^()]*\) after it.

    Also, there is no point in extracting something you already know.

    So, if you plan to extract the last word from the parentheses, you may use

    > library(stringr)
    > s = "( lakshd  asd___ asa1123 Name : _____)"
    > res <- str_match(s, "(?i)\\([^()]*\\b([a-z]\\w*)\\b[^()]*\\)")
    > res[,2]
    [1] "Name"
    

    Note that str_match allows accessing captured values.

    The (?i)\\([^()]*\\b([a-z]\\w*)\\b[^()]*\\) pattern matches parentheses and the last whole word from it.