Search code examples
mysqlperformancejoindistinctgroup-concat

GROUP_CONCAT Performance issue with distinct and sub_query


I am trying to execute this query in MYSQL DB.

SELECT FirstName,LastName
    CONCAT(COALESCE(Address1, ''),
    COALESCE(Address2, '')) AS Address2,
    (SELECT GROUP_CONCAT(EmpDept ' ') FROM EMP_DEPT
     USE INDEX (EMP_DEPT_IDX)
     WHERE EMP_DEPT.EMP_ID = EMP.EMP_ID) AS DEPT,
    COUNT(PIN) AS PIN,
    MIN(SALARY) AS SALARY,
    GROUP_CONCAT(DISTINCT PAY_CODE SEPARATOR ' ') AS PAYROLL_CODE
FROM EMP WHERE HOUSE='YELLOW' AND HOUSE_AREA='220'
    AND (JOIN_YEAR = '2011' OR JOIN_YEAR = '2012')
    AND EMP_BAND='G'
    AND (FRESHER = 'N' OR FRESHER = 'FALSE')   
    AND PIN IS NOT NULL
    AND SALARY > 0
GROUP BY FirstName,LastName Address3, Address2, DEPT

The query is working fine and I am getting the required resultset.

Only the problem is, there is huge performance hit on these two lines of code , written using GROUP_CONCAT,

 (SELECT GROUP_CONCAT(EmpDept ' ') FROM EMP_DEPT
    USE INDEX (EMP_DEPT_IDX)
    WHERE EMP_DEPT.EMP_ID = EMP.EMP_ID) AS DEPT


 GROUP_CONCAT(DISTINCT PAY_CODE SEPARATOR ' ') AS PAYROLL_CODE

I tried adding INDEX, but still no luck.(only these sub queries are taking around 30-40sec to execute)

I tried implementing Joins, but as i am new to mysql there is no luck..

Can anyone help me in making my query execute faster?


Solution

  • Try replacing the correlated subquery with an outer join with a subquery:

    SELECT FirstName,LastName
        CONCAT(COALESCE(Address1, ''),
        COALESCE(Address2, '')) AS Address2,
        COALESCE(DEPT, '') AS DEPT,
        COUNT(PIN) AS PIN,
        MIN(SALARY) AS SALARY,
        GROUP_CONCAT(DISTINCT PAY_CODE SEPARATOR ' ') AS PAYROLL_CODE
    FROM EMP 
    LEFT JOIN (SELECT EMP_ID, GROUP_CONCAT(EmpDept SEPARATOR ' ') AS DEPT
               FROM EMP_DEPT
               GROUP BY EMP_ID) AS EMP_DEPT ON EMP_DEPT.EMP_ID = EMP.EMP_ID
    WHERE HOUSE='YELLOW' AND HOUSE_AREA='220'
        AND (JOIN_YEAR = '2011' OR JOIN_YEAR = '2012')
        AND EMP_BAND='G'
        AND (FRESHER = 'N' OR FRESHER = 'FALSE')   
        AND PIN IS NOT NULL
        AND SALARY > 0
    GROUP BY EMP.EMP_ID
    

    Also, I changed the final GROUP BY to use the EMP_ID column, not the returned column values.

    It's also possible that just changing the final GROUP BY will solve your problem. Grouping by an indexed column should be more efficient, so make sure you have an index on EMP_ID.