Search code examples
performancecsvperlspreadsheetxs

Perl: force Spreadsheet::Read to use Text::CSV_XS


I have an 8MB CSV file. Using Spreadsheet::Read it takes 10 seconds to read:

my $book = ReadData ( 'file.csv' );
my @rows = Spreadsheet::Read::rows($book->[1]); # first sheet
foreach my $i (2 .. scalar @rows) { # ignore first header row
    my $first = $rows[$i-1][1];
    #...
}

Using Text::CSV_XS, it takes 1 second:

open my $fh, "<:encoding(utf8)", 'file.csv' or die $!;
my $csv = Text::CSV_XS->new ({ diag_verbose=>1, auto_diag=>1, binary=>1, sep_char=>";" });
$csv->getline($fh); # Ignore Header
while (my $row = $csv->getline ($fh)) { 
    my $first = $row->[1];
    #...
}
close ($fh);

Can I force Spreadsheet::Read to use Text::CSV_XS and expect similar peformance? I tried:

  1. Specifying a parser:
my $book = Spreadsheet::Read->new (
    'file.csv',
    sep => ';',
    parser => 'csv',
    );
  1. Setting the parser environment variable:
$ENV{SPREADSHEET_READ_CSV} = 'Text::CSV_XS';

Output of Spreadsheet::Read->parsers() is:

$VAR1 = {
          'ext' => 'csv',
          'def' => '',
          'mod' => 'Text::CSV',
          'min' => '1.17',
          'vsn' => '-'
        };
$VAR2 = {
          'ext' => 'csv',
          'def' => '',
          'mod' => 'Text::CSV_PP',
          'min' => '1.17',
          'vsn' => '-'
        };
$VAR3 = {
          'vsn' => '1.50',
          'min' => '0.71',
          'ext' => 'csv',
          'mod' => 'Text::CSV_XS',
          'def' => '*'
        };
$VAR4 = {
          'min' => '0.01',
          'vsn' => '0.87',
          'def' => '*',
          'mod' => 'Spreadsheet::Read',
          'ext' => 'sc'
        };
$VAR5 = {
          'vsn' => '0.65',
          'min' => '0.34',
          'ext' => 'xls',
          'mod' => 'Spreadsheet::ParseExcel',
          'def' => '*'
        };
$VAR6 = {
          'min' => '0.24',
          'vsn' => '0.27',
          'ext' => 'xlsm',
          'def' => '*',
          'mod' => 'Spreadsheet::ParseXLSX'
        };
$VAR7 = {
          'min' => '0.24',
          'vsn' => '0.27',
          'def' => '*',
          'mod' => 'Spreadsheet::ParseXLSX',
          'ext' => 'xlsx'
        };
$VAR8 = {
          'min' => '0.13',
          'vsn' => '-',
          'ext' => 'xlsx',
          'def' => '',
          'mod' => 'Spreadsheet::XLSX'
        };
$VAR9 = {
          'vsn' => undef,
          'min' => '',
          'ext' => 'zzz2',
          'mod' => 'Z20::Just::For::Testing',
          'def' => '*'
        };

also:

$ perl -MSpreadsheet::Read -E'say Spreadsheet::Read::parses( "csv" )'
Text::CSV_XS
$ perl -MText::CSV_XS -E'say Text::CSV_XS->VERSION'
1.50

Solution

  • You asked if you could force Spreadsheet::Read to use Text::CSV_XS.

    But you also said the output from the following is Text::CSV_XS.

    perl -Mv5.14 -MSpreadsheet::Read -e'say Spreadsheet::Read::parses( "csv" )'
    

    This demonstrates that Text::CSV_XS is being used.