I have a method that is going through a CSV file and uploading to Postgres.
CSV.foreach(path, converters: :all)
When it encounters a number such as "2.02E+17" it is uploading "2.0150519e+17", but when it encounters "20150515E000590" it is uploading "Infinity".
If I set
CSV.foreach(path)
When it encounters "2.02E+17" it is uploading "20150519E000010" and "20150515E000590" as "20150515E000590".
I want to be uploading exactly what is shown in Excel. So in the case of "2.02E+17" I want to upload "2.02E+17", but in the case of "20150515E000590" I want to upload "20150515E000590" not "Infinity". I guess my question is how do I get CSV to not override "20150515E000590" with "Infinity"?
First of all, Postgres can probably handle loading CSVs without Ruby's help. As for your question...
CSV does not define data types, so whenever you read CSV data into something that expects data types (like Excel or Ruby) the program has to guess.
When Excel sees 20150519E000010
it is guessing that this is the scientific notation 20150519e10 i.e. 20150519×1010. Excel makes a distinction between the underlying data in the spreadsheet and the way it is displayed, so in this case it opts for a shorter way of displaying that number: 2.02E+17
. So even though Excel is showing you 2.02E+17
, the actual data in the file is 20150519E000010
.
When you read the CSV in Ruby and tell it to convert to Ruby's data types, it makes the same guess (that it is scientific notation) but you get a different display: 2.0150519e+17
. That should be expected because 2.02E+17
is Excel's way of shortening the number for display. Ruby's data types do not conform to Excel. This also explains why 20150515E00059
becomes Infinity
. 20150515×1059 is too large for Ruby's floating point data type, so Ruby converts it to the largest possible float: Infinity.
However, I strongly suspect that both Excel and Ruby are wrong. When I see 20150515E000059
, that looks to me like 2015-05-15 00:00:59
. That's not a number in scientific notation, it's a timestamp! You can define a custom converter for handling the format:
CSV::Converters[:mytime] = lambda do |s|
DateTime.parse(s.tr(?E, ?T)) rescue s
end
CSV.parse("20150515000019", converters: :mytime)
# [[#<DateTime: 2015-05-15T00:00:19+00:00 ((2457158j,19s,0n),+0s,2299161j)>]]