Search code examples
c#html-parsinghtml-agility-packwebscarab

How to extract data from HtmlTable in C# and arrange in a row?


I want to extract data from HTMLTable row by row. But I'm facing problems in separating columns in the rows. The code I'm using below gives me each cell in a single line. But I want each row in 1 line then another. how can I do that?

HtmlNode table = doc.DocumentNode.SelectSingleNode("//table[" + tableCounter + "]");
foreach (var cell in table.SelectNodes(".//tr/td"))
{

    string someVariable = cell.InnerText;
    ReportFileWriter(someVariable);

}
tableCounter++;

This is the output I get from this code:

The Current Output

and the original table is like this:

The Original Html Table

and the output I want is to have spaces between columns:

The Desired Output


Solution

  • Since I don't know your specific website, I used the following code to parse the

    html table.

    You need install Nuget -> HtmlAgilityPack. Code:

                WebClient webClient = new WebClient();
                string page = webClient.DownloadString("http://www.mufap.com.pk/payout-report.php?tab=01");
    
                HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
                doc.LoadHtml(page);
    
                List<List<string>> table = doc.DocumentNode.SelectSingleNode("//table[@class='mydata']")
                            .Descendants("tr")
                            .Skip(1)
                            .Where(tr => tr.Elements("td").Count() > 1)
                            .Select(tr => tr.Elements("td").Select(td => td.InnerText.Trim()).ToList())
                            .ToList();
                 string result = string.Empty;
            foreach (var item in table[0])
            {
                result = result + "        " + item;
            }
            Console.WriteLine(result);
    

    The first row in website:

    enter image description here

    The result you will get: enter image description here