Search code examples
neo4jcypher

Cypher query warning give inconsistent result when fix


When I run this query:

MATCH (n:test) with n limit 100
WITH DISTINCT n, keys(n) AS allKeys
UNWIND allKeys AS key
with n, 
    CASE
        WHEN key STARTS WITH 'prop.title' THEN {column: 'title', value: collect(n[key])}
        WHEN key STARTS WITH 'prop.keywords' THEN {column: 'keywords', value: collect(n[key])}
    END AS data
with n, data, collect(data.value) as values
RETURN n.id, apoc.map.fromPairs(COLLECT([data.column, values]))

This query purpose is to find all the property with given prefix and group their value into an object with apoc. An example result would be like so: Correct result

My issue start when Cypher give me this warning: This feature is deprecated and will be removed in future versions. with status code Neo.ClientNotification.Statement.FeatureDeprecationWarning This is the warning that I get when running the previous query Then I try to fix the query by adding the key to the WITH keyword, like so: WITH n, key, CASE ... But now the value in my object (collect(n[key])) does not return all of the value anymore and only the last one. I guess the rest have been overrides. Wrong result

Does anyone knows how to fix this but still remove the warning?

EDIT: I found out that by changing n to * in the first query without the key added, I get the same wrong result. Also add pictures.


Solution

  • I believe you could rewrite your query to something like this:

    MATCH (n:test) with n limit 100 
    WITH DISTINCT n, keys(n) AS allKeys
    UNWIND allKeys AS key
    WITH n, 
        // Note: in the original query you used the aggregation expression `collect` which made `key` an implicit grouping key.
        CASE
          WHEN key STARTS WITH 'prop.title' THEN {column: 'title', value:n[key]}
          WHEN key STARTS WITH 'prop.keywords' THEN {column: 'keywords', value:n[key]}
        END AS data
    WITH n, data.column AS column, collect(data.value) as values
    RETURN n.id, apoc.map.fromPairs(COLLECT([column, values]))
    

    The problem with your original query was that you used non-grouping keys outside of an aggregation expression, so called "implicit grouping keys". Prior to neo4j 5.0 we allowed implicit grouping keys, but as that can get very confusing it was removed in 5.0. You can read more about the confusion with implicit grouping keys here: https://opencypher.org/articles/2017/07/27/ocig1-aggregations-article/

    So, let's take this step-by-step:

    Assume that you have a single node:

    CREATE (:test{`prop.title1`:"title1", `prop.title2`:"title2", `prop.keywords`:"hey!"})
    

    After this part of the query:

    MATCH (n:test) with n limit 100
    WITH DISTINCT n, keys(n) AS allKeys
    UNWIND allKeys AS key
    

    You have:

    n key allKeys
    n "prop.title1" ["prop.title1", "prop.title2", "prop.keywords"]
    n "prop.title2" ["prop.title1", "prop.title2", "prop.keywords"]
    n "prop.keywords" ["prop.title1", "prop.title2", "prop.keywords"]`

    Let's have a look at the next part of the original query:

    with n, 
    CASE
        WHEN key STARTS WITH 'prop.title' THEN {column: 'title', value: collect(n[key])}
        WHEN key STARTS WITH 'prop.keywords' THEN {column: 'keywords', value: collect(n[key])}
    END AS data
    

    the explicit grouping keys are the projected variables/properties that does not contain any aggregations - which in this case is n. That means that all other variables which are used outside of an aggregation expression are "implicit" grouping keys - in your case the implicit grouping key is key. Implicit grouping keys are no longer supported from 5.0. But, instead of adding it as an explicit grouping key, you can remove the aggregation expression "collect":

    WITH n, 
         CASE
           WHEN key STARTS WITH 'prop.title' THEN {column: 'title', value:n[key]}
           WHEN key STARTS WITH 'prop.keywords' THEN {column: 'keywords', value:n[key]}
         END AS data
    

    This would mean that we know have:

    n data
    n {column: "title", value: "title1"}
    n {column: "title", value: "title2" }
    n {column: "keywords", value: "hey!"}`

    If you now look at the next part of the original query:

    with n, data, collect(data.value) as values
    

    You again have the aggregation expression collect with grouping keys n and data. But you don't want to group on the full data object, instead you want to collect all data.values grouped by n and data.column:

    WITH n, data.column AS column, collect(data.value) as values
    

    Which gives us:

    n column values
    n "title" ["title1", "title2"]
    n "keywords" ["hey!"]

    I hope this made it a bit more clear