Search code examples
perlhttp-headerscgilwplwp-useragent

Better way to proxy an HTTP request using Perl HTTP::Response and LWP?


I need a Perl CGI script that fetches a URL and then returns the result of the fetch - the status, headers and content - unaltered to the CGI environment so that the "proxied" URL is returned by the web server to the user's browser as if they'd accessed the URL directly.

I'm running my script from cgi-bin in an Apache web server on an Ubuntu 14.04 host, but this question should be independent of server platform - anything that can run Perl CGI scripts should be able to do it.

I've tried using LWP::UserAgent::request() and I've got very close. It returns an HTTP::Response object that contains the status code, headers and content, and even has an "as_string" method that turns it into a human-readable form. The problem from a CGI perspective is that "as string" converts the status code to "HTTP/1.1 200 OK" rather than "Status: 200 OK", so the Apache server doesn't recognise the output as a valid CGI response.

I can fix this by using other methods in HTTP::Response to split out the various parts, but there seems to be no public way of getting at the encapsulated HTTP::Headers object in order to call its as_string method; instead I have to hack into the Perl blessed object hash and yank out the private "_headers" member directly. To me this seems slightly evil, so is there a better way?

Here's some code to illustrate the above. If you put it in your cgi-bin directory then you can call it as

http://localhost/cgi-bin/lwp-test?url=http://localhost/&http-response=1&show=1

You can use a different URL for testing if you want. If you set http-response=0 (or drop the param altogether) then you get the working piece-by-piece solution. If you set show=0 (or drop it) then the proxied request is returned by the script. Apache will return the proxied page if you have http-response=0 and will choke with a 500 Internal Server Error if it's 1.

#!/usr/bin/perl

use strict;
use warnings;

use CGI::Simple;
use HTTP::Request;
use HTTP::Response;
use LWP::UserAgent;

my $q = CGI::Simple->new();
my $ua = LWP::UserAgent->new();
my $req = HTTP::Request->new(GET => $q->param('url'));
my $res = $ua->request($req);

# print a text/plain header if called with "show=1" in the query string
# so proxied URL response is shown in browser, otherwise just output
# the proxied response as if it was ours.
if ($q->param('show')) {
    print $q->header("text/plain");
    print "\n";
}

if ($q->param('http-response')) {
    # This prints the status as "HTTP/1.1 200 OK", not "Status: 200 OK".
    print $res->as_string;
} else {
    # This works correctly as a proxy, but using {_headers} to get at
    # the private encapsulated HTTP:Response object seems a bit evil.
    # There must be a better way!
    print "Status: ", $res->status_line, "\n";
    print $res->{_headers}->as_string;
    print "\n";
    print $res->content;
}

Please bear in mind that this script was written purely to demonstrate how to forward an HTTP::Response object to the CGI environment and bears no resemblance to my actual application.


Solution

  • You can go around the internals of the response object at $res->{_headers} by using the $res->headers method, that returns the actual HTTP::Headers instance that is used. HTTP::Response inherits that from HTTP::Message.

    It would then look like this:

    print "Status: ", $res->status_line, "\n";
    print $res->headers->as_string;
    

    That looks less evil, though it's still not pretty.