Search code examples
ruby-on-railsrubyexceldateactivesupport

How to convert MS excel date from float to date format in Ruby?


Trying to parse and XLSX file using roo gem in a ruby script.

In excel dates are stored as floats or integers in the format DDDDD.ttttt, counting from 1900-01-00 (00 no 01). So in order to convert a date such as 40396 - you would take 1900-01-00 + 40396 and you should get 2010-10-15, but I'm getting 2010-08-08.

I'm using active_support/time to do calculation like so:

Time.new("1900-01-01") + 40396.days

Am I doing my calculation wrong or is there a bug in active support?

I'm running ruby 1.9.3-mri on Windows 7 + latest active_support gem (3.2.1)

EDIT

I was looking at the older file in Excel with the wrong data - my script / console were pulling the right data - hence my confusion - I was doing everything right, except for using the right file!!!! Damn the all-nighters!

Thanks to everyone replying, I will keep the question here in case somebody needs info on how to convert dates from excel using ruby.

Also for anyone else running into this - spreadsheet gem DOES NOT support reading XLSX files at this point (v 0.7.1) properly - so I'm using roo for reading, and axlsx for writing.


Solution

  • You have an off-by-one error in your day numbering - due to a bug in Lotus 1-2-3 that Excel and other spreadsheet programs have carefully maintained compatibility with for 30+ years.

    Originally, day 1 was intended to be January 1, 1900 (which would, as you stated, make day 0 equal to December 31, 1899). But Lotus incorrectly considered 1900 to be a leap year, so if you use the Lotus numbers for the present and count backwards, correctly making 1900 a common year, the day numbers for everything before March 1st, 1900, are one too high. Day 1 becomes December 31st, 1899, and day 0 shifts back to the 30th. So the epoch for date arithmetic in Lotus-based spreadsheets is really Saturday, December 30th, 1899. (Modern Excel and some other spreadsheets extend the Lotus bug-compatibility far enough to show February 1900 actually having a 29th day, so they will label day 0 "December 31st" while agreeing that it was a Saturday! But other Lotus-based spreadsheets don't do that, and Ruby certainly doesn't either.)

    Even allowing for this error, however, your stated example is incorrect: Lotus day number 40,396 is August 6th, 2010, not October 15th. I have confirmed this correspondence in Excel, LibreOffice, and Google sheets, all of which agree. You must have crossed examples somewhere.

    Here's one way to do the conversion:

    Time.utc(1899,12,30) + 40396.days #=> 2010-08-06 00:00:00 UTC
    

    Alternatively, you could take advantage of another known correspondence. Time zero for Ruby (and POSIX systems in general) is the moment January 1, 1970, at midnight GMT. January 1, 1970 is Lotus day 25,569. As long as you remember to do your calculations in UTC, you can also do this:

    Time.at( (40396 - 25569).days ).utc # => 2010-08-06 00:00:00 UTC
    

    In either case, you probably want to declare a symbolic constant for the epoch date (either the Time object representing 1899-12-30 or the POSIX "day 0" value 25,569).

    You can replace those calls to .days with multiplication by 86400 (seconds per day) if you don't need active_support/core_ext/integer/time for anything else, and don't want to load it just for this.