Search code examples
performancejoincountcouchbasesql++

N1ql count distinct associated documents


I have a single bucket (Couchbase Community edition 6.5) consisting of the following documents:

employee {
    type: "Employee"
}
X {
    type: "X",
    employeeId: string,
    date: string
}
Y {
    type: "Y",
    employeeId: string,
    date: string
}
Z {
    type: "Z",
    employeeId: string,
    date: string
}

I need to get the total number of documents (X,Y,Z) that are associated with each employee between two dates.

I have written the following query that works but with slow execution time:

CREATE INDEX `index_X` ON `bucket`(`type`,`date`, `employeeId`)
WHERE type = "X"
CREATE INDEX `index_Y` ON `bucket`(`type`,`date`, `employeeId`)
WHERE type = "Y"
CREATE INDEX `index_Z` ON `bucket`(`type`,`date`, `employeeId`)
WHERE type = "Z"

SELECT META(employee).id,
       x.totalX,
       y.totalY,
       z.totalZ,
FROM `bucket` employee 
    LEFT JOIN (
    SELECT obj.employeeId,
           COUNT(obj.employeeId) AS totalX
    FROM `bucket` obj
    WHERE obj.type = "X"
        AND obj.date BETWEEN "startDate" AND "endDate"
    GROUP BY obj.employeeId) x ON x.employeeId = META(employee).id 
    LEFT JOIN (
    SELECT obj.employeeId,
           COUNT(obj.employeeId) AS totalY
    FROM `bucket` obj
    WHERE obj.type = "Y"
        AND obj.date BETWEEN "startDate" AND "endDate"
    GROUP BY obj.employeeId) y ON y.employeeId = META(employee).id 
    LEFT JOIN (
    SELECT obj.employeeId,
           COUNT(obj.employeeId) AS totalZ
    FROM `bucket` obj
    WHERE obj.type = "Z"
        AND obj.date BETWEEN "startDate" AND "endDate"
    GROUP BY obj.employeeId) z ON z.employeeId = META(employee).id 
WHERE employee.type = "Employee"

I also tried the following but this query times out completely!

CREATE INDEX `index_X` ON `bucket`(`type`,`date`, `employeeId`)
WHERE type = "X"
CREATE INDEX `index_Y` ON `bucket`(`type`,`date`, `employeeId`)
WHERE type = "Y"
CREATE INDEX `index_Z` ON `bucket`(`type`,`date`, `employeeId`)
WHERE type = "Z"

SELECT META(employee).id,
       COUNT(x.employeeId),
       COUNT(y.employeeId),
       COUNT(z.employeeId)
FROM `bucket` employee
    LEFT JOIN `bucket` x ON x.employeeId = META(employee).id
    AND x.type = "X"
    AND x.date BETWEEN "startDate" AND "endDate"
    LEFT JOIN `bucket` y ON y.employeeId = META(employee).id
    AND y.type = "Y"
    AND y.date BETWEEN "startDate" AND "endDate"
    LEFT JOIN `bucket` z ON z.employeeId = META(employee).id
    AND z.type = "Z"
    AND z.date BETWEEN "startDate" AND "endDate"
WHERE employee.type = "Employee"
GROUP BY META(employee).id

Can anyone please advise on a more optimal route to follow?

With both queries I can see that my indexes are being used but I can also see in my query plan that for every join a "NestedLoopJoin" is being chained to the previous one. Could this possibly be the problem?

I am still new to writing n1ql queries and trying to figure out the most efficient methods so any advice will be welcome.


Solution

  • CREATE INDEX `index_1` ON `agrigistics_dev`(`employeeId`, type, `date`) WHERE type IN ["X", "Y", "Z"];
    
    SELECT META(employee).id,
           SUM(CASE x.type = "X" THEN 1 ELSE 0 END) xcount,
           SUM(CASE x.type = "Y" THEN 1 ELSE 0 END) ycount,
           SUM(CASE x.type = "Z" THEN 1 ELSE 0 END) zcount
    FROM `bucket` employee
    LEFT JOIN `bucket` x
               ON x.employeeId = META(employee).id AND x.type IN ["X", "Y", "Z"] AND x.date BETWEEN "startDate" AND "endDate"
    WHERE employee.type = "Employee"
    GROUP BY META(employee).id;
    

    To avoid case-cade nested-loop join or explosion of join use CTE(6.5) in this case

    CREATE INDEX `index_1` ON `agrigistics_dev`(type, date, `employeeId`) WHERE type IN ["X", "Y", "Z"];
    
    WITH etype AS (SELECT x.employeeId,
                   SUM(CASE x.type = "X" THEN 1 ELSE 0 END) xcount,
                   SUM(CASE x.type = "Y" THEN 1 ELSE 0 END) ycount,
                   SUM(CASE x.type = "Z" THEN 1 ELSE 0 END) zcount
                   FROM `bucket` x
                   WHERE x.type IN ["X", "Y", "Z"] AND x.date BETWEEN "startDate" AND "endDate"
                   GROUP BY x.employeeId)
    SELECT META(employee).id,
           SUM(y.xcount) xcount,
           SUM(y.ycount) ycount,
           SUM(y.zcount) zcount
    FROM `bucket` AS employee
    LEFT JOIN etype AS y ON y.employeeId = META(employee).id
    WHERE employee.type = "Employee"
    GROUP BY META(employee).id;
    

    https://index-advisor.couchbase.com/indexadvisor/#1

    https://blog.couchbase.com/create-right-index-get-right-performance/