I have an Excel Sheet (A.xls) which has following content:
Date,Value
10/1/2020,36.91
10/2/2020,36.060001
I got following output using same script with Perl v5.6.1 on solaris 5.8
>>./a4_test.pl
INFO>Excel File=A.xls,#WorkSheet=1,AuthorID=Sahoo, Ashish
DEBUG>row 2 - col 0:10-2-20
DEBUG>row 2 - col 1:36.060001
And I got different output for date field using same script with perl v5.26.3 on solaris 5.11
>>./a4_test.pl
INFO>Excel File=A.xls,#WorkSheet=1,AuthorID=Sahoo, Ashish
DEBUG>row 2 - col 0:2020-10-02
DEBUG>row 2 - col 1:36.060001
I used 0.2602 version of Spreadsheet::ParseExcel
on Solaris 8 machine and 0.65 version on Solaris 11 machine.
Why am I getting different output while reading date field from
excel sheet through Spreadsheet::ParseExcel
module?
#!/usr/perl/5.12/bin/perl -w
use Spreadsheet::ParseExcel;
my $srce_file = "a.xls";
my $oExcel = new Spreadsheet::ParseExcel;
my $oBook = $oExcel->Parse($srce_file);
my %hah_sheet = ();
my $header_row = 1;
my($iR, $iC, $oWkS, $oWkC);
my $book = $oBook->{File};
my $nsheet= $oBook->{SheetCount};
my $author= $oBook->{Author};
unless($nsheet){
print "ERR>No worksheet found for source file:$srce_file\n";
return 0;
}
else{
print "INFO>Excel
File=$srce_file,#WorkSheet=$nsheet,AuthorID=$author\n";
}
for(my $iSheet=0; $iSheet < $oBook->{SheetCount} ; $iSheet++) {
next if($iSheet >0);
$oWkS = $oBook->{Worksheet}[$iSheet];
my $rows = 0;
for(my $iR = $oWkS->{MinRow}; defined $oWkS->{MaxRow} && $iR <= $oWkS->{MaxRow} ; $iR++) {
$rows++;
my $str_len = 0;
for(my $iC = $oWkS->{MinCol}; defined $oWkS->{MaxCol} && $iC <= $oWkS->{MaxCol}; $iC++) {
$oWkC = $oWkS->{Cells}[$iR][$iC];
next if ($iR <$header_row);
if (defined($oWkC)){
my $cell_value = $oWkC->Value;
$cell_value =~s/\n+//g; #removed newline inside the value
#
##if the first column at header row is null then skip. Column might be shifted
if($iR==$header_row && $iC == 0){
last unless($cell_value);
}
if($iR == $header_row){
$hah_sheet{$iR}{$iC} = uc($cell_value);
}else {
$hah_sheet{$iR}{$iC} = $cell_value;
$str_len += length($cell_value);
##View cell value by row/column
print "DEBUG>row ${iR} - col ${iC}:$cell_value\n";
}
}else{
$hah_sheet{$iR}{$iC} = ""; #keep position for NULL value
}
} # END of Column loop
} # END of Row loop
} # END of Worksheet
If you search for "date" in Changes, you see this:
0.33 2008.09.07
- Default format for formatted dates changed from 'm-d-yy' to 'yyyy-mm-dd'
This explains why you see different date formats between versions 0.2602 and 0.65 of Spreadsheet::ParseExcel.
If you always want your code to print the same format regardless of which version you are using, you could transform the date in your code. For example, if you always want to see yyyy-mm-dd
:
$cell_value =~ s/^(\d+)-(\d+)-(\d+)$/sprintf '%04d-%02d-%02d', 2000+$3, $1, $2/e;
Or, vice versa:
$cell_value =~ s/^(\d+)-(\d+)-(\d+)$/sprintf '%0d-%0d-%02d', $2, $3, $1-2000/e;