I am using Agility Helper HTML and I have thus far a code as such:
var linkWeb = new HtmlWeb();
var linkDoc = web.Load(link);
foreach (HtmlNode l in linkDoc.DocumentNode.SelectNodes("//p"))
{
Console.WriteLine("text #"+ i++= + l.InnerText);
}
So this reads the web paragraph text just fine except, I want it to read all the paragraphs text combined until another anchor a tag is reached or if you can think of a better method.
<p>
<a href="1.shtml#Top" target="_top">PART 1</a>
CONTENT1;
CONTENT2;
</p>
<p>CONTENT3.</p>
<p>
<a href="2.shtml#Top" target="_top">PART 2</a>
CONTENT1
CONTENT2
CONTENT3
CONTENT4
</p>
<p>CONTENT5.</p>
<p>CONTENT6.</p>
<p>CONTENT8.</p>
<p>
<a href="3.shtml#Top" target="_top">PART 3</a>
CONTENT1
CONTENT2
CONTENT3
CONTENT4.
</p>
So right now with the code I have, it reads the P text of each paragraph separately.
TEXT #1 is
CONTENT1 CONTENT2
TEXT # 2 is CONTENT3.
I want this to read TEXT #1 is CONTENT1 CONTENT2 CONTENT3.
this is dynamic and # of paragraphs change.
Some kind of check to make sure before hitting the anchor it reads all paragraphs / InnerTexts and knows it is the supposed to be in the same Text #.
You could implement this like:
foreach (HtmlNode l in linkDoc.DocumentNode.SelectNodes("//p"))
{
if (l.ChildNodes.Any(node => node.Name == "a"))
{
Console.WriteLine();
Console.Write("text #" + i++);
}
Console.Write(l.InnerText + " ");
}