Search code examples
c#regexdouble-quotessingle-quotes

regex should match only one of two types of quoted strings


I need a regex that match a string that is surrounded by double quotes. It should not match a string surrounded by double quotes if this pattern is surrounded by single quotes:

"string"
" 'xyz' "
"  `"    "
"  `" `"   "
"  `" `" `"  "
'  ' "should match" '  '
'   "should not match"   '

Now I have (https://regex101.com/r/z5PayV/1)

(?:"(([^"]*`")*[^"]*|[^"]*)") 

that matches all lines. But the last line should not be matched. Any solution?


Solution

  • You have to go past single quotes to exclude them from the match

    update

    For C# it has to be done like this.
    Just uses a simple CaptureCollection to get all
    the quoted matches.

    (?:'[^']*'|(?:"(([^"]*`")*[^"]*|[^"]*)")|[\S\s])+
    

    Expanded

     (?:
          ' [^']* '
    
       |  
          (?:
               "
               (                             # (1 start)
                    ( [^"]* `" )*                 # (2)
                    [^"]* 
                 |  [^"]* 
               )                             # (1 end)
               "
          )
       |  
          [\S\s] 
     )+
    

    C# code

    var str =
    "The two sentences are 'He said \"Hello there\"' and \"She said 'goodbye' and 'another sentence'\"\n" +
    "\"  `\"    \"\n" +
    "\"  `\"    \"\n" +
    "\"  `\" `\"   \"\n" +
    "\"  `\" `\" `\"  \"\n" +
    "'   \"   \"   '\n" +
    "\"string\"\n" +
    "\" 'xyz' \"\n" +
    "\"  `\"    \"\n" +
    "\"  `\" `\"   \"\n" +
    "\"  `\" `\" `\"  \"\n" +
    "'  ' \"should match\" '  '\n" +
    "'   \"should not match\"   '\n";
    
    var rx = new Regex( "(?:'[^']*'|(?:\"(([^\"]*`\")*[^\"]*|[^\"]*)\")|[\\S\\s])+" );
    
    Match M = rx.Match( str );
    if (M.Success)
    {
        CaptureCollection cc = M.Groups[1].Captures;
        for (int i = 0; i < cc.Count; i++)
            Console.WriteLine("{0}", cc[i].Value);
    }
    

    Output

    She said 'goodbye' and 'another sentence'
      `"
      `"
      `" `"
      `" `" `"
    string
     'xyz'
      `"
      `" `"
      `" `" `"
    should match
    

    Excuse this, it is the way it's done in PCRE engine

    '[^']*'(*SKIP)(*FAIL)|(?:"(([^"]*`")*[^"]*|[^"]*)")`
    

    https://regex101.com/r/gMiVDU/1

       ' [^']* '
       (*SKIP) (*FAIL) 
    |  
       (?:
            "
            (                             # (1 start)
                 ( [^"]* `" )*                 # (2)
                 [^"]* 
              |  [^"]* 
            )                             # (1 end)
            "
       )
    

    ___________________________-