I am parsing an html email. I need to get the href property from the following html:
<a href="https://sample.com/us/en/suv-rental/united-states/orlando-fl/jeep/grand-cherokee/12345" target="_blank">
<img class="m_3371787045960899181vehicle-image" width="360" height="216" src="https://images.sample.com/media/vehicle/images/12345.620x372.jpg" alt="Jeep Grand Cherokee" title="Jeep Grand Cherokee">
</a>
The only way to select it is to find the a
which has an image, which has a src which includes 'https://images.sample.com'
What I need is: https://sample.com/us/en/suv-rental/united-states/orlando-fl/jeep/grand-cherokee/12345
I am struggling to get this to work. This is what I have so far:
HtmlNode vehicleNode = document.DocumentNode.SelectNodes("//a").Where(x => x.DescendantNodes.Attributes["src"].Value.Contains("images.sample.com")).First();
But this does not compile, as you cannot use x.DescendantNodes...
but I cannot find the correct way to do this.
So how to select using a decendant node property?
It seems, in terms of XPath you can use //a[img/@src[starts-with(., 'https://images.sample.com')]]
.