Search code examples
perlweb-scraping

What's wrong with my IF statement and LWP::Simple?


I am trying to create a simple scraper, and I am using getstore(), but the sceipt won't create the .txt file when used within an if statement. What am I doing wrong there?

#!/usr/bin/perl -w
use strict;
use LWP::Simple;

my $url;
my $content;

print "Enter URL:";

chomp($url = <STDIN>);

$content = get($url);

if ($content =~ s%<(style|script)[^<>]*>.*?</\1>|</?[a-z][a-z0-9]*[^<>]*>|<!--.*?-->%%g) {

    $content = getstore($content,"../crawled_text.txt");
}   

die "Couldn't get $url" unless defined $content;

Solution

  • From the LWP::Simple documentation:

    my $code = getstore($url, $file)

    Gets a document identified by a URL and stores it in the file. The return value is the HTTP response code.

    Your first parameter is a stripped HTML file and likely not a URL. You could use a debugger or print statements in your code to understand more about the contents of your variables and about whether your program goes into an if block.