Search code examples
rweb-scrapinghttr

Redirect GET to HTTPS, if necessary


When I try to download an URL with the HTTP protocol, I get an 400 error:

library(httr)
x1 <- "http://www.sonnenwende-harsewinkel.de/öko-gas/bürgerwerke/"
resp <- httr::GET(x1, httr::timeout(60))
resp[["status_code"]]
#400

The problem is solved when I switch to the HTTPS protocol:

x2 <- "https://www.sonnenwende-harsewinkel.de/öko-gas/bürgerwerke/"
resp <- httr::GET(x2, httr::timeout(60))
resp[["status_code"]]
#200

When I enter the HTTP address in my webbrowser, I get redirected to the HTTPS address. Is it possible to get redirected using httr, too?


Solution

  • Why not just add in an s to the url if you get an http 400?

    rGET <- function(url, ...)
    {
      res <- httr::GET(url, ...)
      if(res$status_code == 400) 
        return(httr::GET(gsub("http://", "https://", url), ...))
      else
        return(res)
    }
    

    So you can do this

    rGET("http://www.sonnenwende-harsewinkel.de/öko-gas/bürgerwerke/")
    #> Response [https://www.sonnenwende-harsewinkel.de/öko-gas/bürgerwerke/]
    #>   Date: 2020-04-30 20:59
    #>   Status: 200
    #>   Content-Type: text/html; charset=UTF-8
    #>   Size: 51.7 kB
    #> <!DOCTYPE html>
    #> <html lang="de-DE"><head>
    #>     <meta charset="utf-8"/>
    #> <link rel="dns-prefetch preconnect" href="https://u.jimcdn.com/" crossorigin="a...
    #> <link rel="dns-prefetch preconnect" href="https://assets.jimstatic.com/" crosso...
    #> <link rel="dns-prefetch preconnect" href="https://image.jimcdn.com" crossorigin...
    #> <link rel="dns-prefetch preconnect" href="https://fonts.jimstatic.com" crossori...
    #> <link rel="dns-prefetch preconnect" href="https://www.google-analytics.com" cro...
    #> <meta name="viewport" content="width=device-width, initial-scale=1"/>
    #> <meta http-equiv="X-UA-Compatible" content="IE=edge"/>
    #> ...