Usage of Pydatalog Aggregate Functions

I have been playing around with the various aggregate functions to get a feel for them, and after being confused for the past few days I am in need of clarification. I either get completely unintuitive behavior or unhelpful errors. For instance, I test:

(p[X]==min_(Y, order_by=Z)) <= Y.in_((4,6,2)) & Z.in_((6,))

looking at sample output:

p[0]==X,Y,Z ([(6,)], [4, 6, 2], [6, 6, 6])

p[1]==X,Y,Z ([(6,)], [6, 4, 2], [6, 6, 6])

p[2]==X,Y,Z ([(6,)], [4, 2, 6], [6, 6, 6])

Why is the minimum 6? 2. Why has the value bound to Z been repeated 3 times? 3. What exactly is the purpose of 'order_by' in relation to the list from which a minimum value is found? 4. Why does the output change based upon if there are multiple values in the 'order_by' list; why does a specific value--6, in this case--in the 'order_by' list effect the output as it has? Another example:

(p[X]==min_(Y, order_by=Z)) <= Y.in_((4,6,2)) & Z.in_((0,))

Output:

p[0]==X,Y,Z ([(6,)], [4, 6, 2], [0, 0, 0])

p[1]==X,Y,Z ([(6,)], [2, 6, 4], [0, 0, 0])

p[2]==X,Y,Z ([(2,)], [2, 6, 4], [0, 0, 0])

Why did the output of X change--from 6 to 2--based upon the indexed provided? Even though the output was wrong in the previous example, at least it was consistent for the indexes used; with there only being one min/max, this makes since.

I at least get to see the output using the min_, max_, sum_ functions; but, I am lost when it comes to rank_ and running_sum_. I follow a similar process when defining my function:

(p[X]==running_sum_(Z, group_by=Z, order_by=Z)) <= Z.in_((43,34,65))

I try to view the output:

p[0]==X

I get the error:

Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.4/dist-packages/pyDatalog/UserList.py", line 16, in repr def repr(self): return repr(self.data) File "/usr/local/lib/python3.4/dist-packages/pyDatalog/pyParser.py", line 109, in data self.todo.ask() File "/usr/local/lib/python3.4/dist-packages/pyDatalog/pyParser.py", line 566, in ask self._data = Body(self.pre_calculations, self).ask() File "/usr/local/lib/python3.4/dist-packages/pyDatalog/pyParser.py", line 686, in ask self._data = literal.lua.ask() File "/usr/local/lib/python3.4/dist-packages/pyDatalog/pyEngine.py", line 909, in _ invoke(subgoal) File "/usr/local/lib/python3.4/dist-packages/pyDatalog/pyEngine.py", line 664, in invoke todo.do() # get the thunk and execute it File "/usr/local/lib/python3.4/dist-packages/pyDatalog/pyEngine.py", line 640, in do self.thunk() File "/usr/local/lib/python3.4/dist-packages/pyDatalog/pyEngine.py", line 846, in aggregate.complete(base_subgoal, subgoal)) File "/usr/local/lib/python3.4/dist-packages/pyDatalog/pyParser.py", line 820, in complete result = [ tuple(l.terms) for l in list(base_subgoal.facts.values())] AttributeError: 'bool' object has no attribute 'values'

What does this mean? What was done incorrectly? What are the relations shared by the running_sum_ (and rank_) parameters--'group_by' and 'order_by'?

As there seems to be no examples on the web, 2 or 3 short examples of rank_ and running_sum_ usage would be greatly appreciated.

Solution

Aggregate clauses are solved in 2 steps :

first resolve the unknowns in the clause, while ignoring the aggregate function
then apply the aggregate function on the result

Here is how you could write the first clause :

(p[None]==min_(Y, order_by=Y)) <= Y.in_((4,6,2))

The variable(s) in the bracket after p is used as the "group by" in SQL, and must also appear in the body of the clause. In this case, it does not vary, so I use None. The order_by variable is needed when you want to retrieve another value than the one you order by.

Let's say you want to retrieve the names of the youngest pupil in each class of a school. The base predicate would be pupil(ClassName, Name, Age).

+ pupil('1A', 'John', 8)
+ pupil('1B', 'Joe', 9)

The aggregate clause would be :

(younger[ClassName] == min_(Name, order_by= Age)) <= pupil(ClassName, Name, Age)

The query would then be :

(younger[ClassName]==X)