Search code examples
pythonpostgresqlblaze

python blaze postgresql can't print "distinct" iris species


Going through this tutorial about blaze, but using the iris dataset in a local postgresql db.

I dont seem to get the same output as shown when using db.iris.Species.distinct() (see In 16 of the Ipython notebook).

My connection string is postgresql://postgres:postgres@localhost:5432/blaze_test

and my simple Python code is:

import blaze as bz
db = bz.Data('postgresql://postgres:postgres@localhost:5432/blaze_test')
mySpecies = db.iris_data.species.distinct()
print mySpecies

All I get in the console (using the Spyder IDE) is distinct(_55.iris_data.species)

How can actually print the distinct species in the table?

NB:I know I am using lowercase "s" for the "species" part in the code, otherwise I just get an error to say: 'Field' object has no attribute 'Species'


Solution

  • The printing mechanism is tripping you up a bit here.

    The __str__ implementation (which is what Python's print function calls) returns a string version of the expression.

    The __repr__ implementation (called when you execute a line in the interpreter) triggers computation and thus allows you to see the results of a computation.

    In [2]: iris = Data(odo(os.path.abspath('./blaze/examples/data/iris.csv'), 'postgresql://localhost::iris'))
    
    In [3]: iris
    Out[3]:
        sepal_length  sepal_width  petal_length  petal_width      species
    0            5.1          3.5           1.4          0.2  Iris-setosa
    1            4.9          3.0           1.4          0.2  Iris-setosa
    2            4.7          3.2           1.3          0.2  Iris-setosa
    3            4.6          3.1           1.5          0.2  Iris-setosa
    4            5.0          3.6           1.4          0.2  Iris-setosa
    5            5.4          3.9           1.7          0.4  Iris-setosa
    6            4.6          3.4           1.4          0.3  Iris-setosa
    7            5.0          3.4           1.5          0.2  Iris-setosa
    8            4.4          2.9           1.4          0.2  Iris-setosa
    9            4.9          3.1           1.5          0.1  Iris-setosa
    ...
    
    In [4]: iris.species.distinct()
    Out[4]:
               species
    0  Iris-versicolor
    1   Iris-virginica
    2      Iris-setosa
    
    In [8]: print(str(iris.species.distinct()))
    distinct(_1.species)
    
    In [9]: print(repr(iris.species.distinct()))
               species
    0  Iris-versicolor
    1   Iris-virginica
    2      Iris-setosa
    

    If you want to shove the result into a concrete data structure like a pandas.Series, do this:

    In [5]: odo(iris.species.distinct(), pd.Series)
    Out[5]:
    0    Iris-versicolor
    1     Iris-virginica
    2        Iris-setosa
    Name: species, dtype: object