Search code examples
perlsolarisparseexcel

Format of the date field gets changed through Spreadsheet::ParseExcel


I have an Excel Sheet (A.xls) which has following content:

Date,Value
10/1/2020,36.91
10/2/2020,36.060001

I got following output using same script with Perl v5.6.1 on solaris 5.8

>>./a4_test.pl
INFO>Excel File=A.xls,#WorkSheet=1,AuthorID=Sahoo, Ashish
DEBUG>row 2 - col 0:10-2-20
DEBUG>row 2 - col 1:36.060001

And I got different output for date field using same script with perl v5.26.3 on solaris 5.11

>>./a4_test.pl
INFO>Excel File=A.xls,#WorkSheet=1,AuthorID=Sahoo, Ashish
DEBUG>row 2 - col 0:2020-10-02
DEBUG>row 2 - col 1:36.060001

I used 0.2602 version of Spreadsheet::ParseExcel on Solaris 8 machine and 0.65 version on Solaris 11 machine.

Why am I getting different output while reading date field from
excel sheet through Spreadsheet::ParseExcel module?

#!/usr/perl/5.12/bin/perl -w
use Spreadsheet::ParseExcel;

my $srce_file = "a.xls"; 
my $oExcel = new Spreadsheet::ParseExcel;
my $oBook = $oExcel->Parse($srce_file); 
my %hah_sheet = ();
my $header_row  = 1;
my($iR, $iC, $oWkS, $oWkC);
my $book = $oBook->{File};
my $nsheet= $oBook->{SheetCount};
my $author= $oBook->{Author};
unless($nsheet){
    print "ERR>No worksheet found for source file:$srce_file\n";
    return 0;
}
else{
    print "INFO>Excel                         
   File=$srce_file,#WorkSheet=$nsheet,AuthorID=$author\n";
}


for(my $iSheet=0; $iSheet < $oBook->{SheetCount} ; $iSheet++) {
    next if($iSheet >0);    
    $oWkS = $oBook->{Worksheet}[$iSheet];

    my $rows = 0;
    for(my $iR = $oWkS->{MinRow}; defined $oWkS->{MaxRow} && $iR <= $oWkS->{MaxRow} ; $iR++) {
        $rows++;

        my $str_len = 0;
        for(my $iC = $oWkS->{MinCol}; defined $oWkS->{MaxCol} && $iC <= $oWkS->{MaxCol}; $iC++) {
            $oWkC = $oWkS->{Cells}[$iR][$iC];
            next if ($iR <$header_row);
            
            if (defined($oWkC)){
                my $cell_value = $oWkC->Value;
                $cell_value =~s/\n+//g;               #removed newline inside the value
                #
                ##if the first column at header row is null then skip. Column might be shifted
                if($iR==$header_row && $iC == 0){
                    last unless($cell_value);
                }
                if($iR == $header_row){
                   $hah_sheet{$iR}{$iC} = uc($cell_value);
                }else {
                   $hah_sheet{$iR}{$iC} = $cell_value;
                   $str_len += length($cell_value);  
                   ##View cell value by row/column
                   print "DEBUG>row ${iR} - col ${iC}:$cell_value\n";
                }
            }else{
                $hah_sheet{$iR}{$iC} = "";              #keep position for NULL value
            }
        } # END of Column loop
    } # END of Row loop
} # END of Worksheet
      

Solution

  • If you search for "date" in Changes, you see this:

    0.33 2008.09.07
        - Default format for formatted dates changed from 'm-d-yy' to 'yyyy-mm-dd'
    

    This explains why you see different date formats between versions 0.2602 and 0.65 of Spreadsheet::ParseExcel.

    If you always want your code to print the same format regardless of which version you are using, you could transform the date in your code. For example, if you always want to see yyyy-mm-dd:

    $cell_value =~ s/^(\d+)-(\d+)-(\d+)$/sprintf '%04d-%02d-%02d', 2000+$3, $1, $2/e;
    

    Or, vice versa:

    $cell_value =~ s/^(\d+)-(\d+)-(\d+)$/sprintf '%0d-%0d-%02d', $2, $3, $1-2000/e;