Search code examples
c#regexhtml-tablenon-greedy

Regex Non-Greedy (Lazy)


I'm attempting to non-greedily parse out TD tags. I'm starting with something like this:

<TD>stuff<TD align="right">More stuff<TD align="right>Other stuff<TD>things<TD>more things

I'm using the below as my regex:

Regex.Split(tempS, @"\<TD[.\s]*?\>");

The records return as below:

""
"stuff<TD align="right">More stuff<TD align="right>Other stuff"
"things"
"more things"

Why is it not splitting that first full result (the one starting with "stuff")? How can I adjust the regex to split on all instances of the TD tag with or without parameters?


Solution

  • The regex you want is <TD[^>]*>:

    <     # Match opening tag
    TD    # Followed by TD
    [^>]* # Followed by anything not a > (zero or more)
    >     # Closing tag
    

    Note: . matches anything (including whitespace) so [.\s]*? is redundant and wrong as [.] matches a literal . so use .*?.