Search code examples
c#httpweb-scrapingwin-universal-apphtml-agility-pack

HttpClient doesn't get full website html source


I've tried to scrape offers from http://olx.pl/, using HttpClient. The problem is the site retrieved by the client is very different and doesn't contain the offers list as when the HTML source is accessed directly from the browser. Here's my code:

  string url = "http://olx.pl/oferty/q-diablo/?search%5Bdescription%5D=1";
  HttpClient client = new HttpClient();
  string result = await client.GetStringAsync(url);

Solution

  • HttpClient wont load content that is generated from JavaScript. Instead you can use WebView that will run js. I ran both, HttpClient result had length of 235507 and WebView result length of 464476.

        WebView wv = new WebView();
        wv.NavigationCompleted += Wv_NavigationCompleted;
        wv.Navigate(new Uri(url));
    
        private async void Wv_NavigationCompleted(WebView sender, WebViewNavigationCompletedEventArgs args)
        {
            string wvresult = await sender.InvokeScriptAsync("eval", new string[] { "document.documentElement.outerHTML;" });
        }