Search code examples
rpostgethttr

Error using R and httr to get the content of the page: http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-tipo-de-participante-ptBR.asp


I want to get the content of the page http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-tipo-de-participante-ptBR.asp

When I copy-and-paste this url to the location of my browser I get the full content of the page.

However, I am unsuccessful using R and httr package using both methods POST (sending the "dData1" parameter) and GET.

POST method passing the parameter "dData1"

library(httr);

url="http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-tipo-de-participante-ptBR.asp";
body = list(dData1="16/05/2018");
POST(url, body = body, encode = "form", verbose());

The result is:

-> POST /pages/portal/bmfbovespa/lumis/lum-tipo-de-participante-ptBR.asp HTTP/1.1
-> Host: www2.bmf.com.br
 (...omitted...)
-> 
>> dData1=16%2F05%2F2018

<- HTTP/1.1 200 OK
(...omitted...)
<- 
Response [http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-tipo-de-participante-ptBR.asp]
  Date: 2018-06-02 16:28
  Status: 200
  Content-Type: text/html
  Size: 111 kB
NA

Even when I tried a simple GET, I am not able to get the content of the page:

library(httr);

url="http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-tipo-de-participante-ptBR.asp";

GET(url,verbose())

And the result is:

-> GET /pages/portal/bmfbovespa/lumis/lum-tipo-de-participante-ptBR.asp HTTP/1.1
(...omitted...)
-> 
<- HTTP/1.1 200 OK
(...omitted...)
<- 
Response [http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-tipo-de-participante-ptBR.asp]
  Date: 2018-06-02 16:33
  Status: 200
  Content-Type: text/html
  Size: 140 kB
NA

I have alredy inspected the request header using browser developement tools but I was unable to figure out what I am doing wrong and I couldn't get the content of this page. Any hint will be appreciated.


Solution

  • That website is not UTF-8 encoded, so you need to find the correct encoding and set it to parse the content:

    enter image description here

    my_url <- "http://www2.bmf.com.br/pages/portal/bmfbovespa/lumis/lum-tipo-de-participante-ptBR.asp"
    response <- GET(my_url)
    response
    content(response,as = "parsed",encoding = "iso-8859-1")
    

    Result:

    > content(response,as = "parsed",encoding = "iso-8859-1")
    {xml_document}
    <html class="no-js" lang="pt-br">
        [1] <head>\n<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">\n<meta name="viewport" content="width=device-width, initial-scale=1.0">\n<link rel=" ...
    [2] <body>\n<!-- Google Tag Manager -->\r\n<noscript><iframe src="//www.googletagmanager.com/ns.html?id=GTM-KPF8G3" height="0" width="0" style="display:none;visibil ...