I am trying to scrape content of a web page using enlive's html-resource function, but I am getting response 403, because I am not coming from a browser.I guess this can be overridden in Java (found answer here) , but I would like to see a clojure way to handle this issue. Perhaps this can be achieved by providing parameters to html-resource function, but I have not encountered an example of how and what needs to be passed as parameter. Any suggestion will be greatly appreciated.
Thanks.
Enlive's html-resource
does not provide a way to override the default request properties. You can, like the other answer you found, open the connection yourself and pass the resulting InputStream
to html-resource
.
Something like the following would handle it:
(with-open [inputstream (-> (java.net.URL. "http://www.example.com/")
.openConnection
(doto (.setRequestProperty "User-Agent"
"Mozilla/5.0 ..."))
.getContent)]
(html-resource inputstream))
Although, it might look better split out into its own function.