I have a single bucket (Couchbase Community edition 6.5) consisting of the following documents:
employees {
employeeGroupId: string,
type: "Employee"
}
clocks {
employeeId: string,
areaId: string
date: string,
type: "Clock"
}
Each employee has multiple corresponding clock items for each day. I need to get the following:
I have written the following query which gets the first and last clock items with execution time <100 ms:
SELECT META(employee).id AS employeeId,
employee.employeeGroupId,
MIN(clock.date) AS clockIn,
MAX(clock.date) AS clockOut
FROM `bucket` employee LEFT
JOIN `bucket` clock ON clock.employeeId = META(employee).id
AND type = "Clock"
AND clock.date BETWEEN "2020-06-01T00:00:00.000Z" AND "2020-06-02T00:00:00.000Z"
WHERE employee.type = "Employee"
GROUP BY employee;
The problem is I need to get the corresponding areaId with the matching clock.
I have written the following query that does. I create two separate sub queries that sort all of the clock items for the day first ascending and then descending and select the first item.
CREATE INDEX adv_employeeId_type_date_blockId ON `bucket`(`employeeId`,`type`,`date`,`blockId`)
CREATE INDEX adv_employeeId_type_date ON `bucket`(`employeeId`,`type`,`date`)
CREATE INDEX adv_type_employeeId_date ON `bucket`(`type`,`employeeId`,`date`)
SELECT META(employee).id AS employeeId,
employee.employeeGroupId,
clockIn,
clockOut
FROM `bucket` employee
LEFT JOIN (
SELECT obj.employeeId,
obj.date,
obj.areaId
FROM `bucket` obj
WHERE obj.employeeId = META(employee).id
AND obj.type = "Clock"
AND obj.date BETWEEN "2020-06-01T00:00:00.000Z" AND "2020-06-02T00:00:00.000Z"
ORDER BY obj.date
LIMIT 1) clockIn ON clockIn.employeeId = META(employee).id
LEFT JOIN (
SELECT obj.employeeId,
obj.date,
obj.areaId
FROM `bucket` obj
WHERE obj.employeeId = META(employee).id
AND obj.type = "Clock"
AND obj.date BETWEEN "2020-06-01T00:00:00.000Z" AND "2020-06-02T00:00:00.000Z"
ORDER BY obj.date DESC
LIMIT 1) clockOut ON clockOut.employeeId = META(employee).id
WHERE employee.type = "Employee"
GROUP BY employee,
clockIn,
clockOut;
The problem is that the above query is inefficient with execution time >10 seconds.
In other words I need to get additional object values from the aggregate MIN() and MAX() functions.
I am sure the second query is not the most efficient method to achieve this, does anyone have any other suggestions?
CREATE INDEX ix1 ON `bucket`(type, `employeeGroupId`) WHERE type = "Employee";
CREATE INDEX ix2 ON `bucket`(`employeeId`, date, areaId) WHERE type = "Clock";
SELECT META(employee).id AS employeeId,
employee.employeeGroupId,
minclock[0] AS clockIn,
minclock[1] AS clockInAreaId,
maxclock[0] AS clockOut,
maxclock[1] AS clockOutAreaId
FROM `bucket` AS employee LEFT
JOIN `bucket` AS clock ON clock.employeeId = META(employee).id
AND type = "Clock"
AND clock.date BETWEEN "2020-06-01T00:00:00.000Z" AND "2020-06-02T00:00:00.000Z"
WHERE employee.type = "Employee"
GROUP BY employee
LETTING minclock = MIN([clock.date,clock.areaId]),
maxclock = MAX([clock.date,clock.areaId]);
OR
SELECT META(employee).id AS employeeId,
employee.employeeGroupId,
MIN([clock.date, {clock.date, clock.areaId}])[1] AS clockIn,
MAX([clock.date, {clock.date, clock.areaId}])[1] AS clockOut,
FROM `bucket` AS employee LEFT
JOIN `bucket` AS clock ON clock.employeeId = META(employee).id
AND type = "Clock"
AND clock.date BETWEEN "2020-06-01T00:00:00.000Z" AND "2020-06-02T00:00:00.000Z"
WHERE employee.type = "Employee"
GROUP BY employee;
Use MIN/MAX on Array. 0th expression is MIN/MAX expression. reset of array positions used only on ties (similar like ORDER BY multiple fields). Result will complete ARRAY expression. Choose which positions you want project. This technique allows you project non-group by expressions.