Search code examples
schemelispracket

How do I read a web page in Racket?


All of the information I can find online is about writing web servers, but there seems to be very little about functions useful for web clients. Ideally, I would like the function to look something like this:

(website "http://www.google.com")

And return a string containing the entire web page, but I would be happy with anything that works.


Solution

  • Here's a simple program that looks like it does what you want:

    #lang racket
    
    (require net/url)
    
    (port->bytes
     (get-pure-port (string->url "http://www.google.com")))
    

    If you're like me, you probably also want to parse it into an s-expression. Neil Van Dyke's neil/html-parsing does this:

    #lang racket
    
    (require (planet neil/html-parsing:2:0)
             net/url)
    
    (html->xexp
     (get-pure-port (string->url "http://www.google.com")))
    

    Note that since this program refers to a planet package, running this program for the first time will download and install the htmlprag package. Building the documentation could take quite a while. That's an one-time cost, though, and running the program again shouldn't take more than a few seconds.

    EDIT: In 2023, this code still works fine, but PLaneT is not widely used at this point, and it would probably be more idiomatic at this point to suggest installing the html-parsing package using raco install html-parsing or with the File>>Package Manager... menu, and then running

    #lang racket
    
    (require html-parsing
             net/url)
    
    (html->xexp
     (get-pure-port (string->url "http://www.google.com")))