I have a VARIANT column call 'REQUEST' in the table 'QWERTY' that contains an Array object inside a JSON like
{
"ID": "123123",
"workflowHistory": [
{
"id": "666",
"workflowType": "CCC",
"entityId": "123123",
"creator": {
"id": "503081",
"displayName": "AGENT2",
"email": "AGENT2@SOMETHING.com",
"userAvatarUrl": "XXXXXXX"
},
"createdDate": "2020-04-30T21:58:09Z",
"deletor": null,
"deletedDate": null,
"clientId": "000000000",
"value": "00000000"
},
{
"id": "555",
"workflowType": "AAA",
"entityId": "123123",
"creator": {
"id": "503080",
"displayName": "AGENT1",
"email": "AGENT1@SOMETHING.com",
"userAvatarUrl": "XXXXXXX"
},
"createdDate": "2020-04-30T21:55:09Z",
"deletor": null,
"deletedDate": null,
"clientId": "000000000",
"value": "00000000"
},
{
"id": "444",
"workflowType": "xyz",
"entityId": "123123",
"creator": {
"id": "503080",
"displayName": "AGENT1",
"email": "AGENT1@SOMETHING.com",
"userAvatarUrl": "XXXXXXX"
},
"createdDate": "2020-04-30T21:19:09Z",
"deletor": null,
"deletedDate": null,
"clientId": "000000000",
"value": "00000000"
},
{
"id": "333",
"workflowType": "BBB",
"entityId": "123123",
"creator": {
"id": "503079",
"displayName": "AGENT0",
"email": "AGENT0@SOMETHING.com",
"userAvatarUrl": "XXXXXXX"
},
"createdDate": "2020-04-30T21:10:09Z",
"deletor": null,
"deletedDate": null,
"clientId": "000000000",
"value": "00000000"
},
{
"id": "222",
"workflowType": "ZZZ",
"entityId": "123123",
"creator": {
"id": "503079",
"displayName": "AGENT0",
"email": "AGENT0@SOMETHING.com",
"userAvatarUrl": "XXXXXXX"
},
"createdDate": "2020-04-30T21:08:09Z",
"deletor": null,
"deletedDate": null,
"clientId": "000000000",
"value": "00000000"
}
]
}
Also, 'QWERTY' table has HAVERST_DATE and the PK ARTICLE_ID (the same as REQUEST:workflowHistory.ID), I am trying to get an output with the following columns:
- ID
- Last createdDate for an AGENTn
- First createdDate for an AGENTn
- the previous createdDate that is made BY AGENTn-1
- the next createdDate that is made BY AGENTn+1
I would like an output like:
For this I'm building A query as follows:
WITH WorkFlow_Parsed AS(
SELECT ARTICLE_ID,
HARVEST_DATE,
value:createdDate::timestamp_tz AS create_date,
value:creator:email AS email,
value:workflowType AS workflowType,
value:value AS value
FROM 'QWERTY', lateral flatten( input => REQUEST:workflowHistory )
),
lag_Agent_timing AS
(SELECT
WorkFlow_Parsed.ARTICLE_ID AS ARTICLE_ID,WorkFlow_Parsed.email,LAG(WorkFlow_Parsed.create_date) IGNORE NULLS over (partition by WorkFlow_Parsed.email,WorkFlow_Parsed.ARTICLE_ID order by WorkFlow_Parsed.create_date) AS lag_date_value
FROM WorkFlow_Parsed),
lead_agent_timing AS
(SELECT
WorkFlow_Parsed.ARTICLE_ID AS ARTICLE_ID,WorkFlow_Parsed.email,LEAD(WorkFlow_Parsed.create_date) IGNORE NULLS over (partition by WorkFlow_Parsed.email,WorkFlow_Parsed.ARTICLE_ID order by WorkFlow_Parsed.create_date) AS lead_date_value
FROM WorkFlow_Parsed)
SELECT
DISTINCT
WorkFlow_Parsed.ARTICLE_ID AS _ARTICLE_ID,
WorkFlow_Parsed.email AS _email,
last_value(WorkFlow_Parsed.create_date) over (partition by WorkFlow_Parsed.email,WorkFlow_Parsed.ARTICLE_ID order by WorkFlow_Parsed.create_date) AS last_date_value,
first_value(WorkFlow_Parsed.create_date) over (partition by WorkFlow_Parsed.email,WorkFlow_Parsed.ARTICLE_ID order by WorkFlow_Parsed.create_date) AS first_date_value,
MAX(lag_Agent_timing.lag_date_value),
MIN(lead_agent_timing.lead_date_value)
FROM WorkFlow_Parsed
JOIN lag_Agent_timing ON WorkFlow_Parsed.ARTICLE_ID=lag_Agent_timing.ARTICLE_ID AND lag_Agent_timing.email=WorkFlow_Parsed.email
JOIN lead_agent_timing ON WorkFlow_Parsed.ARTICLE_ID=lead_agent_timing.ARTICLE_ID AND lead_agent_timing.email=WorkFlow_Parsed.email
GROUP BY _ARTICLE_ID,_email
But I Got the error: "[SYS_VW.CREATE_DATE_1] is not a valid group by expression"`
How could I Fix it?
[SYS_VW.CREATE_DATE_1] is not a valid group by expression
The error is coming from your use of GROUP BY
in the final SELECT
query. It is pointing out that you are referencing/using Workflow_Parsed.create_date
in the query as a non-group column but it isn't part of the GROUP BY _ARTICLE_ID, _email
expression, i.e. it is the same as [Workflow_Parsed.create_date] is not a valid group by expression
that you will receive if you simplify the query a bit.
Snowflake does not permit aggregating over a window function expression and if you'd like to mix a GROUP BY
with a window function, try nesting the query in an structure such as SELECT cols, aggregate(cols) FROM (SELECT cols, window(cols)) GROUP BY cols
to separate the two (i.e. apply window functions over all rows first, then group the entire result it produces).
I'm unsure what the window functions are attempting in your sample query because I do not see the agent's n ± 1
relations anywhere in them, but going by your described requirement and the sample output included, the following should work (it just uses scalar subqueries, no window functions):
WITH workflows AS (
SELECT PARSE_JSON('{"ID":"123123","workflowHistory":[{"id":"666","workflowType":"CCC","entityId":"123123","creator":{"id":"503081","displayName":"AGENT2","email":"AGENT2@SOMETHING.com","userAvatarUrl":"XXXXXXX"},"createdDate":"2020-04-30T21:58:09Z","deletor":null,"deletedDate":null,"clientId":"000000000","value":"00000000"},{"id":"555","workflowType":"AAA","entityId":"123123","creator":{"id":"503080","displayName":"AGENT1","email":"AGENT1@SOMETHING.com","userAvatarUrl":"XXXXXXX"},"createdDate":"2020-04-30T21:55:09Z","deletor":null,"deletedDate":null,"clientId":"000000000","value":"00000000"},{"id":"444","workflowType":"xyz","entityId":"123123","creator":{"id":"503080","displayName":"AGENT1","email":"AGENT1@SOMETHING.com","userAvatarUrl":"XXXXXXX"},"createdDate":"2020-04-30T21:19:09Z","deletor":null,"deletedDate":null,"clientId":"000000000","value":"00000000"},{"id":"333","workflowType":"BBB","entityId":"123123","creator":{"id":"503079","displayName":"AGENT0","email":"AGENT0@SOMETHING.com","userAvatarUrl":"XXXXXXX"},"createdDate":"2020-04-30T21:10:09Z","deletor":null,"deletedDate":null,"clientId":"000000000","value":"00000000"},{"id":"222","workflowType":"ZZZ","entityId":"123123","creator":{"id":"503079","displayName":"AGENT0","email":"AGENT0@SOMETHING.com","userAvatarUrl":"XXXXXXX"},"createdDate":"2020-04-30T21:08:09Z","deletor":null,"deletedDate":null,"clientId":"000000000","value":"00000000"}]}') AS request
), workflow_rows AS (
SELECT
w.request:ID::varchar AS article_id,
lf.value:createdDate::timestamp_tz AS created_date,
lf.value:creator.id::integer AS creator_id,
lf.value:creator.email::varchar AS creator_email,
lf.value:workflowType::varchar AS workflow_type,
lf.value:value::varchar AS workflow_value
FROM workflows w, LATERAL FLATTEN(REQUEST:workflowHistory) lf
), article_workflow_creators AS (
SELECT DISTINCT
article_id,
creator_id,
creator_email
FROM workflow_rows
)
SELECT
awc.article_id,
awc.creator_id,
awc.creator_email,
(SELECT MAX(wr.created_date) FROM workflow_rows wr WHERE wr.article_id = awc.article_id AND wr.creator_id = awc.creator_id) AS last_date_value,
(SELECT MIN(wr.created_date) FROM workflow_rows wr WHERE wr.article_id = awc.article_id AND wr.creator_id = awc.creator_id) AS first_date_value,
(SELECT MAX(wr.created_date) FROM workflow_rows wr WHERE wr.article_id = awc.article_id AND wr.creator_id = awc.creator_id - 1) AS previous_date,
(SELECT MAX(wr.created_date) FROM workflow_rows wr WHERE wr.article_id = awc.article_id AND wr.creator_id = awc.creator_id + 1) AS next_date
FROM article_workflow_creators awc;
For the single JSON row input included in the question, this produces:
+------------+------------+----------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+
| ARTICLE_ID | CREATOR_ID | CREATOR_EMAIL | LAST_DATE_VALUE | FIRST_DATE_VALUE | PREVIOUS_DATE | NEXT_DATE |
|------------+------------+----------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------|
| 123123 | 503081 | AGENT2@SOMETHING.com | 2020-04-30 21:58:09.000 +0000 | 2020-04-30 21:58:09.000 +0000 | 2020-04-30 21:55:09.000 +0000 | NULL |
| 123123 | 503080 | AGENT1@SOMETHING.com | 2020-04-30 21:55:09.000 +0000 | 2020-04-30 21:19:09.000 +0000 | 2020-04-30 21:10:09.000 +0000 | 2020-04-30 21:58:09.000 +0000 |
| 123123 | 503079 | AGENT0@SOMETHING.com | 2020-04-30 21:10:09.000 +0000 | 2020-04-30 21:08:09.000 +0000 | NULL | 2020-04-30 21:55:09.000 +0000 |
+------------+------------+----------------------+-------------------------------+-------------------------------+-------------------------------+-------------------------------+