I have an html file as following:
<h3>
<div id='type'>
Type 1
</div>
<div id='price'>
127.76;
</div>
</h3>
<h3>
<div id='type'>
Type 2
</div>
<div id='price'>
127.76;
</div>
</h3>
Now I want to use CSQuery to extract those types and price into a List, here is the code I'm working on :
var doc = CQ.Create(htmlfile);
var types= (from listR in doc["<h3>"] //get the h3 tag
select new TypeTest
{
Typename = listR.GetAttribute("#type"),
Price = listR.GetAttribute("#price")
}
).ToList();
return types;
However, I couldn't get the details as I wish, as I'm not sure about the doc[] value when I put it as h3. the html file cannot be modified.
The html that you are parsing is an invalid format i.e. multiple identical id's. (There are two id='type'
and id='price
), you must take the following steps.
TypeTest
object.Below is a working example:
// 1
var doc = CQ.Create(html);
// 2
var typeDivs = doc["h3 > div#type"];
var priceDivs = doc["h3 > div#price"];
// 3
var types = typeDivs.Zip(priceDivs, (k, v) => new { k, v })
.Select(h =>
new TypeTest { Typename = h.k.InnerText.Trim(),
Price = h.v.InnerText.Trim() });