Search code examples
c#html-parsing

How to remove a special characters from the parsed string


I am parsing a price of some items from a website. However, I am getting some irrelevant special characters before the string. How do I remove those characters and the string that I want?

I am getting

\n                \n                    \n                    \n                \n\n                \n                    \n                    \n                        AMD YD2600BBAFBOX 3.9GHz Socket AM4 Processor

and    17,975.00

However, I have used Replace method to replaced the unwanted special characters from the string

itemName = itemNameNode.InnerText.Replace("\n", "");
itemPrice = itemPriceNode.InnerText.Replace("                      ", "Current price:");

Still I am not getting the expected result. I am getting the result as

I have linked my image here for reference. It doesn't allow me to post image here (Seriously! stackoverflow)


Solution

  • Instead of doing a replace on the newlines for your itemName string, you could simply use String.Trim. Trim removes any leading or trailing characters on the string that return true to a char.IsWhiteSpace call, of which a new line character is included.

    var x = "\n   Hello   \n";
    
    Console.WriteLine("-");
    Console.WriteLine(x);
    Console.WriteLine("-");
    /* Output:
    -
    
       Hello   
    
    -
    */
    
    Console.WriteLine("-");
    Console.WriteLine(x.Trim());
    Console.WriteLine("-");
    /* Output:
    -
    Hello
    -
    */