Search code examples
perlhtml-tableextract

HTML::TableExtract with a table inside of a table


I have a small script that I am using to manipulate the code from a remote URL (code is separate). The manual page for HTML::TableExtract has the following code section relating to doing a table-in-a-table extract, ie

$te = new HTML::TableExtract
      (
       headers => [qw(Summary Region)],
       chain   => [
                   { depth => 0, count => 2 },
                   { headers => [qw(Part Qty Cost)] }
                  ],
      );

My code contains this, ie:

use HTML::TableExtract;
use strict;
use warnings;

my $te = new HTML::TableExtract
      (
       headers => [qw(Incident Date Time Location Description)],
       chain   => [
                   { depth => 0, count => 2 },
                   { headers => [qw(Unit DIS ENR ONS LEF ARR BUS REM COM)] }
                  ],
      );

$te->parse_file('data.html');

However, running it gives me this:

Can't locate object method "chain" via package "HTML::TableExtract" at /usr/lib/perl5/HTML/Parser.pm line 80.

Is there something I'm missing? (If anyone has a better way to extract a table from within a table (while printing information from both I'm all ears)


Solution

  • I didn't see any document about chain method in the doc of HTML::TableExtract. Maybe you're using an expired version?

    But according to the doc, you could do this using the depth and count attributes:

    $te = HTML::TableExtract->new( 
                                  headers => [qw(Unit DIS ENR ONS LEF ARR BUS REM COM)], 
                                  depth => 1, 
                                  count => 1 
                                 );
    $te->parse($html_string);
    

    depth: Specify how embedded in other tables your tables of interest should be. Top-level tables in the HTML document have a depth of 0, tables within top-level tables have a depth of 1, and so on.

    count: Specify which table within each depth you are interested in, beginning with 0.

    In your case depth and count should be both 1.