I'm struggling with the getForm
and the problem of redirecting my query. I've tried to experiment with cookiefile
and followlocation
as in other topics in Stackoverflow but with no result.
My code:
getForm("http://korpus.pl/poliqarp/poliqarp.php",
query = "pies", corpus = "2", showMatch = "1",showContext = "3",
leftContext = "5", rightContext = "5", wideContext = "50", hitsPerPage = "10",
.opts = curlOptions(
verbose = TRUE,
followlocation=TRUE
)
)
Am I right that I'm getting the the content of the redirection page? If so how can I bypass it?
curl = getCurlHandle(cookiefile = "", verbose = TRUE, followlocation=TRUE)
getForm("http://korpus.pl/poliqarp/poliqarp.php",
query = "pies", corpus = "2", showMatch = "1",showContext = "3",
leftContext = "5", rightContext = "5", wideContext = "50", hitsPerPage = "10",
.opts = curlOptions(
verbose = TRUE,
followlocation=TRUE
)
, curl = curl)
test1 <- getURL("http://korpus.pl/poliqarp/poliqarp.php", curl = curl)
test2 <- getURL("http://korpus.pl/poliqarp/poliqarp.php", curl = curl)
With a bit of persuasion test2 hopefully should contain the results
curl is a handle that will persist across calls. setting cookiefile
tells RCurl to store the cookies.
You can access the info in the curl handle using getCurlInfo(curl)
. For example
> cat(getCurlInfo(curl)$cookielist)
korpus.pl FALSE / FALSE 0 PHPSESSID ark8hbi13e2c4qrp51aq51nj62
The getForm call sets the important cookie PHPSESSID
. The first getURL results in:
> library(XML)
> htmlParse(test1)['//h3'][[1]]
<h3>This page will <a href="poliqarp.php">refresh</a> automatically in a second</h3>
It tells you it will auto refresh probably with javascript so you need to do this refresh manually by issuing another call.