Search code examples
c#regexregex-groupexpresso

Regular expressions, capture group


This would be the sample text:

<option value="USD">American Samoa, United States Dollar (USD)</option>
<option value="EUR">Andorra, Euro (EUR)</option>
<option value="AOA">Angola, Kwanza (AOA)</option>
<option value="XCD">Anguilla, East Caribbean Dollar (XCD)</option>
<option value="XCD">Antigua and Barbuda, East Caribbean Dollar (XCD)</option>
<option value="ARS">Argentina, Peso (ARS)</option>

This is my try:

<option selected="selected" value="[A-Z]{3}">(?<Test>).+</option>.

The problem is, it only matches the first occurrence it finds. While I want it to get them all. What am I missing in my try?


Solution

  • Regex is not recommended for HTML parsing.

    Why don´t you use HTML Agility Pack?

    http://htmlagilitypack.codeplex.com/

    Here is an example:

     HtmlDocument doc = new HtmlDocument();
     doc.LoadHtml("YOUR HTML STRING");
     foreach(HtmlNode node in doc.DocumentElement.SelectNodes("//select/option[@selected='selected']")
     {
        string text = node.InnerHtml;                  // "American Samoa, United States Dollar (USD)"
        string value = node.Attributes["value"].Value; // "USD"
     }
    

    You can also download via NuGet =)

    If you like this solution, you can read a little bit more about XPath:

    http://www.w3schools.com/XPath/xpath_syntax.asp

    If you still want to use Regex, you can check this site:

    http://www.jslab.dk/tools.regex.php