Search code examples
azure-databricksdatabricks-sql

Databricks SQL with Sub Queries and Group By: Error Add to group by or wrap in first() (or first_value)


I am coding Databricks SQL with the following code:

SELECT
activityid,
concat_ws(ts_inititals, ', ') AS attendees
FROM 
    (SELECT 
     activityparty.activityid,
     systemuser.systemuserid,
     systemuser.ts_inititals 
     FROM baseorigination.activityparty
     LEFT JOIN baseorigination.systemuser
     ON activityparty.partyid = systemuser.systemuserid
     GROUP BY activityparty.activityid, systemuser.systemuserid, systemuser.ts_inititals ) attendees
GROUP BY activityid

As you can see the code uses a subquery, which is fine when executing the code on MS SQL, however, I'm getting the following error on Databricks

Error in SQL statement: AnalysisException: expression 'attendees.ts_inititals' is neither present in the group by, nor is it an aggregate function. Add to group by or wrap in first() (or first_value) if you don't care which value you get.;

Can someone let me know how to re-write the code to overcome this issue?


Solution

  • You may use ARRAY_SORT() and COLLECT_SET():

    SELECT
        activityid,
        array_join(collect_set(ts_inititals), ', ') AS attendees
    FROM
    (
        SELECT DISTINCT
            a.activityid,
            s.systemuserid,
            s.ts_inititals 
         FROM baseorigination.activityparty a
         LEFT JOIN baseorigination.systemuser s
             ON a.partyid = s.systemuserid
    ) attendees
    GROUP BY activityid;
    

    Note that I have replaced the GROUP BY in the inner query with a distinct select.