I am coding Databricks SQL with the following code:
SELECT
activityid,
concat_ws(ts_inititals, ', ') AS attendees
FROM
(SELECT
activityparty.activityid,
systemuser.systemuserid,
systemuser.ts_inititals
FROM baseorigination.activityparty
LEFT JOIN baseorigination.systemuser
ON activityparty.partyid = systemuser.systemuserid
GROUP BY activityparty.activityid, systemuser.systemuserid, systemuser.ts_inititals ) attendees
GROUP BY activityid
As you can see the code uses a subquery, which is fine when executing the code on MS SQL, however, I'm getting the following error on Databricks
Error in SQL statement: AnalysisException: expression 'attendees.ts_inititals' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;
Can someone let me know how to re-write the code to overcome this issue?
You may use ARRAY_SORT()
and COLLECT_SET()
:
SELECT
activityid,
array_join(collect_set(ts_inititals), ', ') AS attendees
FROM
(
SELECT DISTINCT
a.activityid,
s.systemuserid,
s.ts_inititals
FROM baseorigination.activityparty a
LEFT JOIN baseorigination.systemuser s
ON a.partyid = s.systemuserid
) attendees
GROUP BY activityid;
Note that I have replaced the GROUP BY
in the inner query with a distinct select.