Search code examples
rhttp-redirectrcurlgeturl

getForm with get method - how to bypass redirection?


I'm struggling with the getForm and the problem of redirecting my query. I've tried to experiment with cookiefile and followlocation as in other topics in Stackoverflow but with no result.

My code:

  getForm("http://korpus.pl/poliqarp/poliqarp.php",
          query = "pies", corpus = "2", showMatch = "1",showContext = "3",
          leftContext = "5", rightContext = "5", wideContext = "50", hitsPerPage = "10",              
          .opts = curlOptions(
            verbose = TRUE,
            followlocation=TRUE
            )
      )

Am I right that I'm getting the the content of the redirection page? If so how can I bypass it?


Solution

  • curl = getCurlHandle(cookiefile = "", verbose = TRUE, followlocation=TRUE)
    
    getForm("http://korpus.pl/poliqarp/poliqarp.php",
            query = "pies", corpus = "2", showMatch = "1",showContext = "3",
            leftContext = "5", rightContext = "5", wideContext = "50", hitsPerPage = "10",              
            .opts = curlOptions(
              verbose = TRUE,
              followlocation=TRUE
            )
            , curl = curl)
    
    
    test1 <- getURL("http://korpus.pl/poliqarp/poliqarp.php", curl = curl)
    test2 <- getURL("http://korpus.pl/poliqarp/poliqarp.php", curl = curl)
    

    With a bit of persuasion test2 hopefully should contain the results

    curl is a handle that will persist across calls. setting cookiefile tells RCurl to store the cookies. You can access the info in the curl handle using getCurlInfo(curl). For example

    > cat(getCurlInfo(curl)$cookielist)
    korpus.pl   FALSE   /   FALSE   0   PHPSESSID   ark8hbi13e2c4qrp51aq51nj62
    

    The getForm call sets the important cookie PHPSESSID. The first getURL results in:

    > library(XML)
    > htmlParse(test1)['//h3'][[1]]
    <h3>This page will <a href="poliqarp.php">refresh</a> automatically in a second</h3> 
    

    It tells you it will auto refresh probably with javascript so you need to do this refresh manually by issuing another call.