Search code examples
perllwp

HTML::TableExtract not finding table


I'm having trouble with some code I've written. It's basically a proof of concept for myself and I'll be using it to run words through to get another form of it (fun Icelandic conjugation). In the code I've had to have an if sentence in case the URL from the word itself leads to more than one result. From there I find the relevant link, get the content from there and use TableExtract to get the table I need. Except I don't get anything useful.

#!perl



use warnings;
use HTML::TableExtract qw(tree);
use LWP::Simple;




sub saekja{
    $table = $te->first_table_found;
    $table_tree = $table->tree;
    $table_html = $table_tree->as_HTML;
};


sub leidretta{
#Ef að leitin skilar fleirri en einni niðurstöðu
    if ($content =~ /orð fundust./){

    $content =~ m/<li><strong><a href="(.*)">/;

#byrjunin á strengnum fyrir urlið
    $upphaf = "http://bin.arnastofnun.is/";
#skeytir saman strengjunum til að búa til urlið
    $urlid = $upphaf . $1;
    $content = get($urlid);
    $te  = new HTML::TableExtract( depth=>0, count=>0);



}
};
$content = get("http://bin.arnastofnun.is/leit.php?q=Fiskisl%C3%B3%C3%B0");

&leidretta;
&saekja;

I will admit that I am relatively new at this (wrote my first perl almost exactly a week ago). But I am completely stumped and copious amounts of googling haven't turned up anything useful.


Solution

  • This should help you go a bit forward:

    #!perl
    
    use utf8;
    use warnings;
    use HTML::TableExtract qw(tree);
    use LWP::Simple;
    
    $content = get("http://bin.arnastofnun.is/leit.php?q=Fiskisl%C3%B3%C3%B0");
    
    if ($content =~ /orð fundust./) {
    
        $content =~ m/<li><strong><a href="(.*)">/;
    
        $upphaf = "http://bin.arnastofnun.is/";
        $urlid = $upphaf . $1;
        $content = get($urlid);
    
        $te  = new HTML::TableExtract(depth=>0, count=>0);
    
        $te->parse($content);   # this was missing
    
        $table = $te->first_table_found;
        $table_tree = $table->tree;
        $table_html = $table_tree->as_HTML;
    
        print $table_html,"\n";
    }
    

    You basically did not parse anything, so HTML::TableExtract did not have anything to work on. I also needed to add use utf8 to the script so it processed non-ASCII characters properly.