Search code examples
c#web-scrapinghtml-agility-pack

C# HtmlAgilityPack behaves differently when running project in Visual Studio debugger or when running a built executable


I'm trying to scrap a web site with HtmlAgilityPack library for C#. More specifically, I'm pulling data from a table which contains crypto-exchanges, it's rates and other data. The problem is: when I launch the project in Visual Studio, all the data is always pulled correctly and there are no errors, however, when I build the project and run it as an executable, one of the table columns is 50% of the time empty (I suppose the Xpath just returns an empty node).

What is more, I have spotted such behavior in another project of mine with HtmlAgilityPack. It is similar web scraping, but with another web site where i first need to log in. Each hour I need to re-login to get fresh cookies from the site, but every 10-12 hours the project fails as it can't find the specified html element in the login page. I launch it 1 time in visual studio, it works just alright and i re-launch the executable and it continues normal behavior for another 10-12 hours until it stumbles again.

Below is the code snippet for the first case:

//Getting the web page
HtmlWeb web = new HtmlWeb();
htmlDoc = web.Load("https://www.bestchange.ru/bitcoin-to-bitcoin-bep20.html", proxies[p].ip, proxies[p].port, proxies[p].login, proxies[p].password);

// parse NAME
if (htmlDoc.DocumentNode != null)
{
    xpath = $"//body/div[3]/div[2]/div/div/div[1]/div[2]/div[6]/div[2]/table/tbody/tr{i}/td[2]/div/div/div";
    HtmlAgilityPack.HtmlNode bodyNode = htmlDoc.DocumentNode.SelectSingleNode(xpath);
    if (bodyNode != null)
    {
        exchange = bodyNode.InnerHtml.ToString();
        Console.WriteLine("Name: " + exchange);
    }
}
//parse PRICE
//This is what gets screwed
if (htmlDoc.DocumentNode != null)
{
    xpath = $"//body/div[3]/div[2]/div/div/div[1]/div[2]/div[6]/div[2]/table/tbody/tr{i}/td[3]/div[1]/text()";
    HtmlAgilityPack.HtmlNode bodyNode = htmlDoc.DocumentNode.SelectSingleNode(xpath);
    if (bodyNode != null)
    {
        Double.TryParse(bodyNode.InnerHtml, out price);
        Console.WriteLine("Price: " + price);
    }
}

So the price is what is usually pulled wrong (as 0).

I tried to run it on different OS - mac and windows, code it again on another platform, but the result is always the same.


Solution

  • I finally figured out the problem, it has nothing to do with HtmlAilityPack, instead it was a culture issue. Somehow my Double.TryParse Method was considering doubles only with "," instead of "." (and was outputting the same). I soled it by setting the default thread culture to "en-Us".