I want to get the content of the page http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-tipo-de-participante-ptBR.asp
When I copy-and-paste this url to the location of my browser I get the full content of the page.
However, I am unsuccessful using R and httr package using both methods POST (sending the "dData1" parameter) and GET.
POST method passing the parameter "dData1"
library(httr);
url="http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-tipo-de-participante-ptBR.asp";
body = list(dData1="16/05/2018");
POST(url, body = body, encode = "form", verbose());
The result is:
-> POST /pages/portal/bmfbovespa/lumis/lum-tipo-de-participante-ptBR.asp HTTP/1.1
-> Host: www2.bmf.com.br
(...omitted...)
->
>> dData1=16%2F05%2F2018
<- HTTP/1.1 200 OK
(...omitted...)
<-
Response [http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-tipo-de-participante-ptBR.asp]
Date: 2018-06-02 16:28
Status: 200
Content-Type: text/html
Size: 111 kB
NA
Even when I tried a simple GET, I am not able to get the content of the page:
library(httr);
url="http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-tipo-de-participante-ptBR.asp";
GET(url,verbose())
And the result is:
-> GET /pages/portal/bmfbovespa/lumis/lum-tipo-de-participante-ptBR.asp HTTP/1.1
(...omitted...)
->
<- HTTP/1.1 200 OK
(...omitted...)
<-
Response [http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-tipo-de-participante-ptBR.asp]
Date: 2018-06-02 16:33
Status: 200
Content-Type: text/html
Size: 140 kB
NA
I have alredy inspected the request header using browser developement tools but I was unable to figure out what I am doing wrong and I couldn't get the content of this page. Any hint will be appreciated.
That website is not UTF-8 encoded, so you need to find the correct encoding and set it to parse the content:
my_url <- "http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-tipo-de-participante-ptBR.asp"
response <- GET(my_url)
response
content(response,as = "parsed",encoding = "iso-8859-1")
Result:
> content(response,as = "parsed",encoding = "iso-8859-1")
{xml_document}
<html class="no-js" lang="pt-br">
[1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">\n<meta name="viewport" content="width=device-width, initial-scale=1.0">\n<link rel=" ...
[2] <body>\n<!-- Google Tag Manager -->\r\n<noscript><iframe src="//www.googletagmanager.com/ns.html?id=GTM-KPF8G3" height="0" width="0" style="display:none;visibil ...