I have the following code:
require 'date'
f = File.open(filepath)
f.each_with_index do |line, i|
a, b = line.split("\t")
d = DateTime.strptime(a, '%m/%d/%Y %I:%M %p')
puts "#{a} --- #{b}"
break unless i < 100
end
And I'm getting the following error:
c_reader.rb:10:in `strptime': invalid date (ArgumentError)
from c_reader.rb:10:in `block in <main>'
from c_reader.rb:6:in `each'
from c_reader.rb:6:in `each_with_index'
from c_reader.rb:6:in `<main>'
The file content:
1/30/2014 1:00 AM 1251.6 1/30/2014 2:00 AM 1248 1/30/2014 3:00 AM 1246.32 1/30/2014 4:00 AM 1242.96 1/30/2014 5:00 AM 1282.08 1/30/2014 6:00 AM 1293.84 1/30/2014 7:00 AM 1307.04 1/30/2014 8:00 AM 1337.76 1/30/2014 9:00 AM 1357.92
If I type this into IRB, it works perfect:
DateTime.strptime("1/30/2014 2:00 PM", '%m/%d/%Y %I:%M %p')
can someone please tell me what's going on here?
Your example data wasn't matching what your code was trying to process so I adjusted that for this. Plus, it needed something to show the AM/PM was being honored.
With those tweaks to the data, your code works fine. strptime
is returning valid DateTime objects.
require 'date'
[
"1/30/2014 1:00 AM\t1251.6",
"1/30/2014 2:00 AM\t1248",
"1/30/2014 3:00 PM\t1246.32",
"1/30/2014 4:00 PM\t1242.96",
].each do |line|
a, b = line.split("\t")
puts DateTime.strptime(a, '%m/%d/%Y %I:%M %p')
end
# >> 2014-01-30T01:00:00+00:00
# >> 2014-01-30T02:00:00+00:00
# >> 2014-01-30T15:00:00+00:00
# >> 2014-01-30T16:00:00+00:00
Your data file has a BOM ("byte-order-mark"). The first two bytes indicate the "endianness" of the order of bytes in the file. In addition, each character actually occupies two bytes. This is a UTF-16LE file because fffe
has a missing bit (0xfe
== 0b11111110
) signifying the end of the byte-pair is smaller than the first byte. If it was feff
it'd be a "big-endian":
0000000: fffe 3100 2f00 3300 3000 2f00 3200 3000 ..1./.3.0./.2.0.
Ruby doesn't know what to do with those because it's expecting its default of UTF-8. To fix that you need to tell Ruby how to interpret that. Look at the documentation for IO.new
to see how to define encodings. Ruby assumes data will be UTF-8, so the incoming data has to be converted from UTF-16LE to UTF-8. This is one way to do it:
require 'date'
File.open(
"test.csv",
"rb:BOM|UTF-16LE:UTF-8"
) do |fi|
fi.each_with_index do |line, i|
a, b = line.split("\t")
d = DateTime.strptime(a, '%m/%d/%Y %I:%M %p')
puts "#{ 1 + i } #{a} --- #{b}"
break unless i < 100
end
end
Running that outputs:
1 1/30/2014 1:00 AM --- 1251.6 2 1/30/2014 2:00 AM --- 1248 3 1/30/2014 3:00 AM --- 1246.32 4 1/30/2014 4:00 AM --- 1242.96 5 1/30/2014 5:00 AM --- 1282.08 6 1/30/2014 6:00 AM --- 1293.84 7 1/30/2014 7:00 AM --- 1307.04 8 1/30/2014 8:00 AM --- 1337.76 9 1/30/2014 9:00 AM --- 1357.92