Search code examples
c#webclient

download CSV from google Insight?


I successfully did that before in the past 4-5 maybe 6 months back, but now i see the site has changed .I am able to get the desired search result using HttpWebRequest the issue is with downloading the CSV file.

The download won't work. I reproduced this with WebClient, get all the cookies but still it won't work.

When I do so, I get this in the file

.....meta http-equiv="refresh" content="0; url='http://www.google.com/trends#content=1&geo=US-AL&q=snooker&cmpt=q&hl=en-AU'">

location.replace("http://www.google.com/trends#content\x3d1\x26geo\x3dUS-AL\x26q\x3dsnooker\x26cmpt\x3dq\x26hl\x3den-AU")

The code to download file is as follows:

public void downloadsheet(string url, string path)
    {
        try
        {
            using (WebClient client = new WebClient())
            {



                string tmpCookieString = string.Empty;

                string[] array = webBrowser1.Document.Cookie.Split(new char[]
                        {
                            ';'
                        });
                for (int i = 0; i < array.Length; i++)
                {
                    string cookie = array[i];
                    string name = cookie.Split(new char[]
                            {
                                '='
                            })[0];
                    string value = cookie.Substring(name.Length + 1);

                    //client.Headers.Add(name, value);
                    if (i < array.Length - 1)
                    {
                        tmpCookieString = tmpCookieString + name + "=" + value + ";";
                    }
                    else
                    {
                        tmpCookieString = tmpCookieString + name + "=" + value;
                    }
                }

                client.Headers.Add(HttpRequestHeader.Cookie, tmpCookieString);
                client.Headers.Add("Accept", "text/html, application/xhtml+xml, */*");
                client.Headers.Add("User-Agent", "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.2)");
                client.Headers.Add("Accept-Language", "en-US");
                using (FileStream file = File.Create(path))
                {
                    byte[] bytes = client.DownloadData(url);
                    file.Write(bytes, 0, bytes.Length);
                }
            }
        }
        catch (Exception exp_DE)
        {
        }
    }

The url is use is:

http://www.google.com/trends/trendsReport?hl=en-AU&q=snooker&geo=US-AL&cmpt=q&content=1&export=2

If I use WebBrowser control to navigate to the respective link above it does open up a dialogue box..


Solution

  • The problem is the HttpOnly cookies (i.e. SID and HSID) are missing from WebBrowser.Document.Cookie for security purpose.

    Here is the solution:

    [DllImport("wininet.dll", CharSet = CharSet.Auto, SetLastError = true)]
    static extern bool InternetGetCookieEx(string pchURL, string pchCookieName, StringBuilder pchCookieData, ref uint pcchCookieData, int dwFlags, IntPtr lpReserved);
    const int INTERNET_COOKIE_HTTPONLY = 0x00002000;
    
    private static string GetGlobalCookies(string uri)
    {
        uint datasize = 2048;
        StringBuilder cookieData = new StringBuilder((int)datasize);
        if (InternetGetCookieEx(uri, null, cookieData, ref datasize, INTERNET_COOKIE_HTTPONLY, IntPtr.Zero)
            && cookieData.Length > 0)
        {
            return cookieData.ToString();
        }
        else
        {
            return null;
        }
    }
    
    public void downloadsheet(string url, string path)
    {
        try
        {
            using (WebClient client = new WebClient())
            {
                string tmpCookieString = GetGlobalCookies(webBrowser1.Url.AbsoluteUri);
    
                client.Headers.Add(HttpRequestHeader.Cookie, tmpCookieString);
    
                client.Headers.Add("Accept", "text/html, application/xhtml+xml, */*");
                client.Headers.Add("User-Agent", "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; InfoPath.2)");
                client.Headers.Add("Accept-Language", "en-US");
                using (FileStream file = File.Create(path))
                {
                    byte[] bytes = client.DownloadData(url);
                    file.Write(bytes, 0, bytes.Length);
                }
            }
        }
        catch (Exception exp_DE)
        {
        }
    }
    

    Of course, you should sign in your account before calling InternetGetCookieEx.