Search code examples
c#htmlhtml-tablepunctuationinnertext

Parsing HTML Table <td></td> InnerText Strip Punctuation (Comma)


I am parsing HTML table into text file and the below is my code sample. In the cols6 or the 6th <td></td>, the innertext is e.g. 70,430. I couldn't work it out on how to ignore the comma when writing the innertext to text file. I would like it to write only 70430 instead of 70,430. May I know what shall I do to cols6[j].InnerText in order to get rid of the , in the numbers? Any help would be much appreciated. Thank you! :)

        // Load HTML
        HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
        doc.Load(fileName);
        // Get all tables in the document
        HtmlNodeCollection tables = doc.DocumentNode.SelectNodes("//table");

        using (FileStream fs = new FileStream(@"..\..\bin\Debug\Pages\" + "Director.txt", FileMode.Append))
        using (StreamWriter sw = new StreamWriter(fs))
        {
            // Iterate all rows in the relevant table
            HtmlNodeCollection rows = tables[2].SelectNodes(".//tr[position() >2]");
            for (int i = 0; i < rows.Count; ++i)
            {
                // Iterate all columns in this row
                HtmlNodeCollection cols = rows[i].SelectNodes(".//td[1]");
                HtmlNodeCollection cols2 = rows[i].SelectNodes(".//td[2]");
                HtmlNodeCollection cols3 = rows[i].SelectNodes(".//td[3]");
                HtmlNodeCollection cols4 = rows[i].SelectNodes(".//td[4]");
                HtmlNodeCollection cols5 = rows[i].SelectNodes(".//td[5]");
                HtmlNodeCollection cols6 = rows[i].SelectNodes(".//td[6]");
                HtmlNodeCollection cols7 = rows[i].SelectNodes(".//td[7]");
                for (int j = 0; j < cols.Count; ++j)
                    // Get the value of the column and print it
                    sw.WriteLine(cols[j].InnerText + "," + cols2[j].InnerText + "," + cols3[j].InnerText + "," +
                                 cols4[j].InnerText + "," + cols5[j].InnerText + "," + cols6[j].InnerText + "," + cols7[j].InnerText + ",822");
            }
            sw.Flush();
            sw.Close();
            fs.Close();
        }

Solution

  • You can Replace() the comma.

    cols6[j].InnerText = cols6[j].InnerText.Replace(",", "");
    

    For the WriteLine() you could also go like this:

    sw.WriteLine(cols[j].InnerText + "," + cols2[j].InnerText + "," + cols3[j].InnerText + "," +
                                 cols4[j].InnerText + "," + cols5[j].InnerText + "," + cols6[j].InnerText.Replace(",", "") + "," + cols7[j].InnerText + ",822");