Search code examples
c#regexwatin

C# Watin Find.ByText with Regex


I have the following problem here: I'm trying to get a element from a webpage using Watin's Find.ByText. However, I fail to use regex in C#.

This statement will return the desired element.

return this.Document.Element(Find.ByText("781|262"));

When I try to use regex, I get back the whole page.

return this.Document.Element(Find.ByText(new Regex(@"781\|262")));

I am trying to get this element:

<td>781|262</td>

I also tried

return this.Document.Element(Find.ByText(Predicate));

private bool Predicate(string s)
{
  return s.Equals("781|262");
}

The above works, while this does not:

 private bool Predicate(string s)
 {
   return new Regex(@"781\|262").IsMatch(s);
 }

I now realized, in the predicate s is the whole page content. I guess the issue is with Document.Element. Any help appreciated, thank you.


Solution

  • Well, I did not realize the Regex will also match the body/html element too, since the pattern is obviously also included in them. I had to specify that the text must begin and end with the pattern by using ^ and $, so it only matches the desired element:

    ^781\u007c262$
    

    \u007c matches |, I used this since MSDN documentation also did.

    The final code:

    <td>781|262</td>
    
    return Document.TableCell(Find.ByText(new Regex(@"^\d{3}\|\d{3}$")));
    

    Document.TableCell to speedup the search by only trying Regex on td elements.

    @ is used to prevent C# from interpreting the \ as escape sequence.

    ^ is used to only match elements with text beginning with the following pattern \d{3} match didit 0-9 3 times

    \| match | literally

    \d{3} match digit 0-9 3 times

    $ the element must also end with this pattern