Search code examples
javahttpclojurehttp-status-code-403enlive

Handling response code: 403 for URL with clojure enlive


I am trying to scrape content of a web page using enlive's html-resource function, but I am getting response 403, because I am not coming from a browser.I guess this can be overridden in Java (found answer here) , but I would like to see a clojure way to handle this issue. Perhaps this can be achieved by providing parameters to html-resource function, but I have not encountered an example of how and what needs to be passed as parameter. Any suggestion will be greatly appreciated.

Thanks.


Solution

  • Enlive's html-resource does not provide a way to override the default request properties. You can, like the other answer you found, open the connection yourself and pass the resulting InputStream to html-resource.

    Something like the following would handle it:

    (with-open [inputstream (-> (java.net.URL. "http://www.example.com/")
                                .openConnection
                                (doto (.setRequestProperty "User-Agent"
                                                           "Mozilla/5.0 ..."))
                                .getContent)]
      (html-resource inputstream))
    

    Although, it might look better split out into its own function.