I'm using Blaze (0.6.3) with Anaconda 2.1.0 (on Python 2.7.8). I'm trying to use filters based on dates on Table's rows.
The mock TSV file is the following:
name amount date
foo 100 2001-05-11 08:54:48.063856
bar 1000 0001-01-01 00:00:00.0
baz 10000 1970-01-02 00:00:00.0
The python code is
from blaze import *
from datetime import datetime
data = Table(CSV('mock.tsv'))
data[data.name > 'bar']
data[data.amount > 1000]
data[data.date > datetime(1970,1,1)]
The first two filters are ok, but the third one throws a SyntaxError
.
It all seems to boil down to the following:
lambda (name, amount, date): date > (1970-01-01 00:00:00)
which is syntactically invalid. Somehow, somewhere, datetime(1970,1,1)
was translated to datetime(1970-01-01 00:00:00)
, then the datetime
was forgotten. Blaze itself recognizes the date
column with ?datetime
type, which is what I want, but then it fails in the comparison.
Am I using it the wrong way?
This was an older bug that has since been fixed. Here it is working with development version. I believe that the latest stable release on Anaconda (0.6.5) should work fine as well
In [1]: !cat tmp/myfile.csv
name, amount, date
foo, 100, 2001-05-11 08:54:48.063856
bar, 1000, 0001-01-01 00:00:00.0
baz, 10000, 1970-01-02 00:00:00.0
In [2]: from blaze import *
In [3]: data = Table('tmp/myfile.csv')
In [4]: from datetime import datetime
In [5]: data[data.date > datetime(1970,1,1)]
Out[5]:
name amount date
0 foo 100 2001-05-11 08:54:48.063856
1 baz 10000 1970-01-02 00:00:00
The following should solve your problem
conda update blaze
Also, Blaze is happy to coerce your strings to the appropriate type, just in case you were too lazy to create the datetime yourself
In [6]: data[data.date > '1970-01-01']
Out[6]:
name amount date
0 foo 100 2001-05-11 08:54:48.063856
1 baz 10000 1970-01-02 00:00:00