How to set the right RCurl options to download from NSE website

I am trying to download files from the NSE India website (nseindia.com). The problem is that webmaster does not like scraping programs downloading files or reading pages from the website. They have a user agent based restriction it seems.

The file I am trying to download is http://www.nseindia.com/archives/equities/bhavcopy/pr/PR280815.zip

I am able to download this from the linux shell using

curl -v -A "Mozilla" http://www.nseindia.com/archives/equities/bhavcopy/pr/PR280815.zip

The output is this

About to connect() to www.nseindia.com port 80 (#0) * Trying 115.112.4.12... % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0connected

GET /archives/equities/bhavcopy/pr/PR280815.zip HTTP/1.1 User-Agent: Mozilla Host: www.nseindia.com Accept: / < HTTP/1.1 200 OK < Server: Oracle-iPlanet-Web-Server/7.0 < Content-Length: 374691 < X-frame-options: SAMEORIGIN < Last-Modified: Fri, 28 Aug 2015 12:20:02 GMT < ETag: "5b7a3-55e051f2" < Accept-Ranges: bytes < Content-Type: application/zip < Date: Sat, 29 Aug 2015 17:56:05 GMT < Connection: keep-alive < { [data not shown] PK 5 365k 5 19977 0 0 34013 0 0:00:11 --:--:-- 0:00:11 56592

This allows me to the download the file.

The code I am using in R Curl is this

  library("RCurl")

  jurl <- "http://www.nseindia.com/archives/equities/bhavcopy/pr/PR280815.zip"
  juseragent <- "Mozilla"
  myOpts = curlOptions(verbose = TRUE, header = TRUE, useragent = juseragent)
  jfile <- getURL(jurl,.opts=myOpts)

This, too, does not work.

I have also unsuccessfully tried using download.file from the base library with the user agent changed.

Any help will be appreciated.

Solution

First, your problem is not setting the user agent, but downloading binary data. This works:

jfile <- getURLContent(jurl, .opts=myOpts, binary=TRUE)

Here is a (more) complete example using httr instead of RCurl.

library(httr)
url <- "http://www.nseindia.com/archives/equities/bhavcopy/pr/PR280815.zip"
response <- GET(url, user_agent("Mozilla"))
response$status                                          # 200 OK
# [1] 200
tf <- tempfile()
writeBin(content(response, "raw"), tf)                   # write response content (the zip file) to a temporary file
files <- unzip(tf, exdir=tempdir())                      # unzips to system temp directory and returns a vector of file names
df.lst <- lapply(files[grepl("\\.csv$",files)],read.csv) # convert .csv files to list of data.frames
head(df.lst[[2]])
#      SYMBOL SERIES                  SECURITY HIGH.LOW INDEX.FLAG
# 1 AGRODUTCH     EQ AGRO DUTCH INDUSTRIES LTD        H         NA
# 2    ALLSEC     EQ   ALLSEC TECHNOLOGIES LTD        H         NA
# 3      ALPA     BE     ALPA LABORATORIES LTD        H         NA
# 4      AMTL     EQ     ADV METERING TECH LTD        H         NA
# 5  ANIKINDS     BE       ANIK INDUSTRIES LTD        H         NA
# 6   ARSHIYA     EQ           ARSHIYA LIMITED        H         NA