I implemented a very simple Streaming Analytics query:
SELECT
Collect()
FROM
Input TIMESTAMP BY ts
GROUP BY
TumblingWindow(second, 3)
I produce on an event hub input with a python script:
...
iso_ts = datetime.fromtimestamp(ts).isoformat()
data = dict(ts=iso_ts, value=value)
msg = json.dumps(data, encoding='utf-8')
# bus_service is a ServiceBusService instance
bus_service.send_event(HUB_NAME, msg)
...
I consume from a queue:
...
while True:
msg = bus_service.receive_queue_message(Q_NAME, peek_lock=False)
print msg.body
...
The problem is that I cannot see any error from any point in the Azure portal (the input and the output are tested and are ok), but I cannot get any output from my running process!
I share a picture of the diagnostic while the query is running:
Can somebody give me an idea for where to start troubleshooting?
Thank you so much!
Ok, I guess I isolated the problem.
First of all, the query format should be like this:
SELECT
Collect()
INTO
[output-alias]
FROM
[input-alias] TIMESTAMP BY ts
GROUP BY
TumblingWindow(second, 3)
I tried to remove the TIMESTAMP BY
clause and everything goes well; so, I guess that the problem is with that clause.
I paste an example of JSON-serialized input data:
{
"ts": "1970-01-01 01:01:17",
"value": "foo"
}
One could argue that the timestamp is too old (seventies), but I also tried with current timestamps and I didn't get any output and any error on the input.
Can somebody imagine what is going wrong? Thank you!
I discovered that my question was a duplicate of Basic query with TIMESTAMP by not producing output.
So, the solution is that you cannot use data from the seventies, because streaming analytics will consider that all the tuples are late and will drop them.
I re-tried to produce in-time tuples and, after a long latency, I could see the output.
Thanks to everybody!