I have the following Cypher query:
PROFILE
MATCH (childDStat:JobableStatistic {jobableId: childD.id})
WITH collect({`childDStat`:childDStat, `childD`:childD}) as childDStats
CALL apoc.cypher.mapParallel2(
" WITH _.childD as childD, _.childDStat as childDStat
WITH childD, childDStat
UNWIND childD.detailedCriterionIds as dCId
WITH childD, childDStat, dCId + coalesce(childDStat['replaceableCriterionIds.' + dCId],[]) as cGroup
WITH childD, childDStat, cGroup
WHERE NOT AlL(x IN cGroup WHERE x IN $zeroCriterionIds )
WITH childD, childDStat, collect(cGroup) as cGroups
WHERE size(cGroups) >= size(childD.detailedCriterionIds)
UNWIND cGroups as cGroup
WITH childD, childDStat, cGroup
WHERE ANY(x IN cGroup WHERE x IN $detailedCriterionIds)
WITH childD, childDStat, collect(cGroup) as cGroups
WHERE size(cGroups) > 1
RETURN childD, childDStat, cGroups ",
{`detailedCriterionIds`: [3, 5, 7, 8, 12, 13, 14, 15, 16, 18, 20, 21, 23, 26, 28, 29, 30, 31, 33, 35, 36, 40, 42, 44, 45, 46, 47, 51, 54],
`zeroCriterionIds`: []},
childDStats, 6, 10)
YIELD value
WITH value.childD as childD, value.childDStat as childDStat, value.cGroups as cGroups WITH collect({`childDStat`:childDStat, `childD`:childD, `cGroups`:cGroups}) as childDStats
CALL apoc.cypher.mapParallel2(
" WITH _.childD as childD, _.childDStat as childDStat, _.cGroups as cGroups WITH childD, childDStat, size(cGroups) as cGroupsSize, cGroups
UNWIND cGroups as cGroup
WITH childD, childDStat, cGroupsSize, cGroup
UNWIND cGroup as cId WITH childD, childDStat, cGroupsSize, cGroup, cId, cGroup[0] as cG0
WITH childD, childDStat, cGroupsSize, cGroup, cId, cG0, childDStat['criterionAvgVoteWeights.' + cG0] as childDStatCriterionAvgVoteWeight,
childDStat['criterionExperienceMonths.' + cG0] as childDStatCriterionExperienceMonth,
$criterionAvgVoteWeights[toString(cId)] as criterionAvgVoteWeight, $criterionExperienceMonths[toString(cId)] as criterionExperienceMonth
WHERE
(childDStatCriterionAvgVoteWeight = 0 OR childDStatCriterionAvgVoteWeight <= criterionAvgVoteWeight OR criterionAvgVoteWeight IS NULL) AND
(childDStatCriterionExperienceMonth = 0 OR childDStatCriterionExperienceMonth <= criterionExperienceMonth OR criterionExperienceMonth IS NULL)
WITH childD, childDStat, cGroupsSize, cG0, collect(cId) as cIds WITH childD, childDStat, cGroupsSize, collect(DISTINCT cG0 + cIds) as cGroups
WHERE size(cGroups) >= cGroupsSize
RETURN childD, childDStat, cGroups ",
{`detailedCriterionIds`: [3, 5, 7, 8, 12, 13, 14, 15, 16, 18, 20, 21, 23, 26, 28, 29, 30, 31, 33, 35, 36, 40, 42, 44, 45, 46, 47, 51, 54],
`zeroCriterionIds`: [],
`criterionAvgVoteWeights`: {`51`:5.0, `8`:0.0, `33`:5.0, `21`:0.0, `31`:0.0, `26`:4.0, `14`:5.0, `36`:3.0, `46`:3.0, `12`:3.0, `18`:5.0, `28`:5.0, `16`:2.0, `7`:5.0, `40`:1.0, `5`:5.0, `44`:4.0, `3`:1.0, `54`:4.0, `20`:4.0, `42`:4.0, `30`:3.0, `15`:4.0, `47`:1.0, `35`:1.0, `13`:3.0, `45`:3.0, `23`:4.0, `29`:1.0},
`criterionExperienceMonths`: {`8`:109, `33`:8, `21`:184, `31`:14, `26`:100, `14`:157, `36`:140, `46`:123, `12`:85, `18`:96, `28`:116, `16`:15, `7`:63, `40`:56, `5`:166, `44`:101, `3`:129, `42`:84, `20`:102, `30`:173, `15`:97, `47`:54, `13`:91, `35`:137, `45`:119, `23`:162, `29`:97}
},
childDStats, 6, 10)
YIELD value
RETURN value.childD.id
The following condition takes most of the query execution time:
WHERE
(childDStatCriterionAvgVoteWeight = 0 OR childDStatCriterionAvgVoteWeight <= criterionAvgVoteWeight OR criterionAvgVoteWeight IS NULL) AND
(childDStatCriterionExperienceMonth = 0 OR childDStatCriterionExperienceMonth <= criterionExperienceMonth OR criterionExperienceMonth IS NULL)
With this condition in place the query works ~1000ms
, without it 5ms
.
Is there something I may do with this condition in order to improve the query performance?
P.S
This is just a test query. The real query uses parameters for each single value.
In (childDStatCriterionAvgVoteWeight = 0 OR childDStatCriterionAvgVoteWeight <= criterionAvgVoteWeight OR criterionAvgVoteWeight IS NULL)
:
criterionAvgVoteWeight IS NULL
first. That way, if it is NULL
, then you would not waste time testing with a NULL
value in the childDStatCriterionAvgVoteWeight <= criterionAvgVoteWeight
expression (which would be considered false
), and you would also not need to test childDStatCriterionAvgVoteWeight = 0
. This could be a big win if the value is frequently NULL
.criterionAvgVoteWeight
is always >= 0 (when it is not NULL
), then the childDStatCriterionAvgVoteWeight = 0
test can be eliminated completely.In short, you could consider using this expression instead: (criterionAvgVoteWeight IS NULL OR childDStatCriterionAvgVoteWeight <= criterionAvgVoteWeight)
.
Similar considerations apply to (childDStatCriterionExperienceMonth = 0 OR childDStatCriterionExperienceMonth <= criterionExperienceMonth OR criterionExperienceMonth IS NULL)
.
These changes may save some time, depending on your actual data characteristics, but filtering is not free.