Search code examples
c#web-scrapingpaginationpostback

Need guidance on creating a post request and getting back a value


I am trying to make a web service that i need to crawl data from. The problem is that the site that i need to get the data from, is in a asp gridview that has paging in it.. So what i need is, to read the html, do a postback to the page - so it will give me the next page of the gridview, and then get the new html code (the response) from whom i can parse and get the data i need...

I tried in many ways to solve this problem, but i did not succeed. So could you tell me where/what i am doing wrong?

Code:

[WebMethod]
    public string eNabavki2()
    {
        WebClient client = new WebClient();
        client.Encoding = Encoding.UTF8;
        string htmlCode = client.DownloadString("https://site.com/Default.aspx");
        string vsk = getBetween(htmlCode, "id=\"__VIEWSTATEKEY\" value=\"", "\" />");

        WebRequest request = WebRequest.Create("https://site.com/Default.aspx");

        request.ContentType = "application/x-www-form-urlencoded";
        request.Method = "POST";

        var webRequest = (HttpWebRequest)request;
        webRequest.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:20.0) Gecko/20100101 Firefox/20.0"; //Googlebot/2.1 (+http://www.googlebot.com/bot.html)
        //set form data
        string postData = string.Format("__EVENTTARGET={0}" +
            "&__EVENTARGUMENT={1}" + 
            "&__LASTFOCUS={2}"+
            "&__VIEWSTATEKEY={3}"+
            "&__VIEWSTATE={4}"+
            "&__SCROLLPOSITIONX={5}"+
            "&__SCROLLPOSITIONY={6}"+
            "&ctl00$ctl00$cphGlobal$cphPublicAccess$publicCFTenders$dgPublicCallForTender$ctl13$ddlPageSelector={7}",
        System.Web.HttpUtility.UrlEncode("ctl00$ctl00$cphGlobal$cphPublicAccess$publicCFTenders$dgPublicCallForTender$ctl13$ddlPageSelector"),
            /*1*/string.Empty,
            /*2*/string.Empty,
            /*3*/string.Empty,//vsk
            /*4*/string.Empty,
            /*5*/"0",
            /*6*/"383",
            /*7*/"2");
        byte[] byteArray = Encoding.UTF8.GetBytes(postData);

        //send the form data to the request stream
        request.ContentLength = byteArray.Length;
        Stream dataStream = request.GetRequestStream();
        dataStream.Write(byteArray, 0, byteArray.Length);
        dataStream.Close();

        var response = request.GetResponse();

        // Get the stream containing content returned by the server.
        dataStream = response.GetResponseStream();

        StreamReader reader = new StreamReader(dataStream);
        string responseFromServer = reader.ReadToEnd();

        // Clean up the streams.
        reader.Close();
        dataStream.Close();
        response.Close();

        return responseFromServer;
    }

Ok, so few things, in the postData string i included every thing i could find on the page that is send. I used fidler for this, and those all (26) arguments it gave me. The one i really need is the pageSelector (to change his value)

Also i notice there is a __VIEWSTATEKEY in the html code, which gets a different value everytime. You can see i tried first to get that value from the html (the vsk string), but that did not change anything..

I am sorry, but i am not familiar with this post/request thing. But i need it for a project for university, so please if someone could help me solve this....

Edit: Here is a prt scr on what fidler is giving me for the headers: enter image description here


Solution

  • Are there any cookies that are expected by the web site that you are POSTing to? Check Fiddler to see if any cookies are attached to the POST when you use the site manually.

    If so, you will need the cookies that are received when you issue the GET request and attach them to the second POST request. See Using CookieContainer with WebClient class for info on how to do this with WebClient.