Search code examples
c#httpwebrequest

Error 403 while trying to download an image, but not to show it


I'm getting an error 403 while I try to do anything to an image's Url (be it get the file size or download it) but I don't get any error while trying to show the image.

I hope I'm clear enough, but if need be this is an example of url posing problem:

Image URL / Site show the image

I'm using this code to get the file size which works great but not on this site for exemple :

public void getFileSize(string uri)
{
    try
    {
        waitGetSize = 0;
        HttpWebRequest req = (HttpWebRequest)WebRequest.Create(uri);
        req.Timeout = 5000;
        req.Method = "HEAD";
        HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
        imgSize = resp.ContentLength;
        imgSizeKb = imgSize / 1024;
        waitGetSize = 1;
    }
    catch (Exception ex)
    {
        MetroMessageBox.Show(this, ex.Message, "Exception :", MessageBoxButtons.OK, MessageBoxIcon.Error);
    }
}

As pointed out by cFrozenDeath, I used a HEAD request, so I tried using a GET request to the exact same effect. Same result by simply not stating the request type I want.

So is there a way to get the file size or at least download the file knowing it's shown OK when opened in a browser?


Solution

  • You have to mimic a webbrowser when you want to scrape content from websites.

    Sometimes this means you need to provide and/or keep the Cookies you get when you land initially on a website, sometimes you have to tell the webserver which page linked to the resource.

    In this case you need to provide the Referer in the header:

    public  void getFileSize(string uri)
    {
            HttpWebRequest req = (HttpWebRequest)WebRequest.Create(uri);
            // which page do we want that server to believe we call this from
            req.Referer = "http://www.webtoons.com/";
    
            req.Timeout = 5000;
            req.Method = "GET";  // or do a HEAD    
            HttpWebResponse resp = (HttpWebResponse)req.GetResponse();
            // rest omitted 
    } 
    

    That particular image has a length of 273073 bytes.

    Do note that scraping content might be against the terms of service of the particular website. Make sure you don't end up doing illegal stuff.