Search code examples
socketsluafetchluasocket

Fetching page of url using luasocket and proxy


So far, I have the following piece:

local socket = require "socket.http"
client,r,c,h = socket.request{url = "http://example.com/", proxy="<my proxy and port here>"}
for i,v in pairs( c ) do
  print( i, v )
end

which gives me an output like the following:

connection  close
content-type    text/html; charset=UTF-8
location    http://www.iana.org/domains/example/
vary    Accept-Encoding
date    Tue, 24 Apr 2012 21:43:19 GMT
last-modified   Wed, 09 Feb 2011 17:13:15 GMT
transfer-encoding   chunked
server  Apache/2.2.3 (CentOS)

which means that the connection established just perfectly. Now, I want to fetch the title of my url's using this socket.http. I searched previous SO questions and the luasocket's http documentation. but, I still have no idea on how to fetch/store the whole/part of the page in a variable and do something with it.

Please help.


Solution

  • You are using the 'generic' form of http.request(), which requires storing the body via a LTN12 sink. It's not as complicated as it sounds, try this code:

    local socket = require "socket.http"
    local ltn12 = require "ltn12"; -- LTN12 lib provided by LuaSocket
    
    -- This table will store the body (possibly in multiple chunks):
    local result_table = {};
    client,r,c,h = socket.request{
        url = "http://example.com/",
        sink = ltn12.sink.table(result_table),
        proxy="<my proxy and port here>"
    }
    -- Join the chunks together into a string:
    local result = table.concat(result_table);
    -- Hacky solution to extract the title:
    local title = result:match("<[Tt][Ii][Tt][Ll][Ee]>([^<]*)<");
    print(title);
    

    If your proxy is constant throughout your application then a more straightforward solution would be to use the simple form of http.request(), and specify the proxy via http.PROXY:

    local http = require "socket.http"
    http.PROXY="<my proxy and port here>"
    
    local result = http.request("http://www.youtube.com/watch?v=_eT40eV7OiI")
    local title = result:match("<[Tt][Ii][Tt][Ll][Ee]>([^<]*)<");
    print(title);
    

    Output:

        Flanders and Swann - A song of the weather
      - YouTube