I have data like this:
<td><a href="/New_York_City" title="New York City">New York</a></td>
And I would like to get New York out of it.
I don't have any skill in regex what so ever. I have tried this though:
StreamReader sr = new StreamReader("c:\\USAcityfile2.txt");
string pattern = "<td>.*</td>";
Regex r = new Regex(pattern, RegexOptions.IgnoreCase);
Regex r1 = new Regex("<a .*>.*</a>", RegexOptions.IgnoreCase);
string read = "";
while ((read = sr.ReadLine()) != null)
{
foreach (Match m in r.Matches(read))
{
foreach (Match m1 in r1.Matches(m.Value.ToString()))
Console.WriteLine(m1.Value);
}
}
sr.Close();
sr.Dispose();
this gave me <a href="/New_York_City" title="New York City">New York</a>
.
How can reach to data between <a .*>
and </a>
? thanks.
If you insist on a regex for this particular case, then try this:
String pattern = @"(?<=<a[^>]*>).*?(?=</a>)
(?<=<a[^>]*>)
is a positive lookbehind assertion to ensure that there is <a[^>]*>
before the wanted pattern.
(?=</a>)
is a positive lookahead assertion to ensure that there is </a>
after the pattern
.*?
is a lazy quantifier, matching as less as possible till the first </a>
A good reference for regular expressions is regular-expressions.info