I'm trying to use Perl to scrape a publications list as follows:
use XML::XPath;
use XML::XPath::XMLParser;
use LWP::Simple;
my $url = "https://connects.catalyst.harvard.edu/Profiles/profile/xxxxxxx/xxxxxx.rdf";
my $content = get($url);
die "Couldn't get publications!" unless defined $content;
When I run it on my local (Windows 7) machine it works fine. When I try to run it on the linux server where we are hosting some websites, it dies. I installed XML and LWP using cpan so those should be there. I'm wondering if the problem could be some sort of security or permissions on the server (keeping it from accessing an external website), but I don't even know where to start with that. Any ideas?
Turns out I didn't have LWP::Protocol::https" installed. I found this out by switching
LWP::Simple
to
LWP::UserAgent
and adding the following:
my $ua = LWP::UserAgent->new;
my $resp = $ua->get('https://connects.catalyst.harvard.edu/Profiles/profile/xxxxxx/xxxxxxx.rdf' );
print $resp;
It then returned an error telling me it didn't have the protocol to access the https without LWP::Protocol::https, so I installed it with
cpan LWP::Protocol::https
and all was good.