Search code examples
rcurllibcurlrcurlhttr

Post request using cookies with cURL, RCurl and httr


In Windows cURL I can post a web request similar to this:

curl  --dump-header cook.txt ^
  --data "RURL=http=//www.example.com/r&user=bob&password=hello" ^
  --user-agent  "Mozilla/5.0"  ^
  http://www.example.com/login

With type cook.txt I get a response similar to this:

HTTP/1.1 302 Found                                                 
Date: Thu, ******
Server: Microsoft-IIS/6.0                                          
SERVER: ******                                                  
X-Powered-By: ASP.NET                                              
X-AspNet-Version: 1.1.4322                                         
Location: ******
Set-Cookie: Cookie1=; domain=******; expires=****** ******
******
******
Cache-Control: private                                             
Content-Type: text/html; charset=iso-8859-1                        
Content-Length: 189

I can manually read cookie lines like: Set-Cookie: AuthCode=ABC... (I could script this of course). So I can use AuthCode for subsequent requests.

I am trying do the same in R with RCurl and/or httr (still don't know which one is better for my task).

When I try:

library(httr)

POST("http://www.example.com/login",
     body= list(RURL="http=//www.example.com/r",
                user="bob", password="hello"),
     user_agent("Mozilla/5.0"))  

I get a response similar to this:

Response [http://www.example.com/error]
  Status: 411
  Content-type: text/html
<h1>Length Required</h1> 

By and large I know about 411-error and I could try to fix the request; but I do not get it in cURL, so I am doing something wrong with the POST command.

Can you help me in translating my cURL command to RCurl and/or httr?


Solution

  • Based on Juba suggestion, here is a working RCurl template.

    The code emulates a browser behaviour, as it:

    1. retrieves cookies on a login screen and
    2. reuses them on the following page requests containing the actual data.


    ### RCurl login and browse private pages ###
    
    library("RCurl")
    
    loginurl ="http=//www.*****"
    mainurl  ="http=//www.*****"
    agent    ="Mozilla/5.0"
    
    #User account data and other login pars
    pars=list(
         RURL="http=//www.*****",
         Username="*****",
         Password="*****"
    )
    
    #RCurl pars     
    curl = getCurlHandle()
    curlSetOpt(cookiejar="cookiesk.txt",  useragent = agent, followlocation = TRUE, curl=curl)
    #or simply
    #curlSetOpt(cookiejar="", useragent = agent, followlocation = TRUE, curl=curl)
    
    #post login form
    web=postForm(loginurl, .params = pars, curl=curl)
    
    #go to main url with real data
    web=getURL(mainurl, curl=curl)
    
    #parse/print content of web
    #..... etc. etc.
    
    
    #This has the side effect of saving cookie data to the cookiejar file 
    rm(curl)
    gc()