Search code examples
jsonneo4jcypherneo4j-apoc

Shortest path for each node in collection/json Neo4j apoc


Short task description: I have a json document, composed of a set of origin/source nodes, for each of them I need to find 1st, 2nd and 3rd shortest path to the set of target nodes. The input json is as follows:

{
    "origin":[
        {"label":"Alcohol drinks",
        "tag":[],
        "type":"string",
        "xpath":[]
        },

        {"label":"Wine",
        "tag":["red","white"],
        "type":"string",
        "xpath":["Alcohol drinks"]
        },

        {"label":"Port wine",
        "tag":["Portugal","sweet","strong"],
        "type":"string",
        "xpath":["Alcohol drinks","Wine"]
        },

        {"label":"Sandeman Cask 33",
        "tag":["red","expensive"],
        "type":"string",
        "xpath":["Alcohol drinks","Wine","Port wine"]
        }
    ],

    "target":[
        {"label":"Drinks",
        "tag":[],
        "type":"string",
        "xpath":[]
        },

        {"label":"Tea",
        "tag":["black", "green"],
        "type":"string",
        "xpath":["Drinks"]
        },

        {"label":"Carbonated water",
        "tag":[],
        "type":"string",
        "xpath":["Drinks","Tea"]
        },

        {
        "label":"Pepsi",
        "tag":["sweet","cheap"],
        "type":"string",
        "xpath":["Drinks","Tea","Carbonated water"]
        }
    ]
}

Nodes are already inserted into the DB and corresponding relationships are built. Nodes are connected, thus it is possible to build at least one path from origin to target.

To find the shortest path I am using the following Cypher query:

    CALL apoc.load.json("file:///D:/project/neo_proj/input.json") YIELD value 
UNWIND value.origin AS orig UNWIND value.target AS tar 
MATCH(origin:concept{name:orig.label}) 
MATCH(target:concept{name:tar.label}), 
path = shortestPath((origin)-[*1..3]-(target)) RETURN path ORDER BY length(path) ASC LIMIT 4

This query maps all the origin nodes to all target nodes. But I need something like:

   CALL apoc.load.json("file:///D:/project/neo_proj/input.json") YIELD value
UNWIND value.origin AS orig UNWIND value.target AS tar 
MATCH(origin:concept{name:orig.label}) MATCH(target:concept{name:tar.label}) 
FOREACH (x IN orig.label 
| MERGE(origin:concept{name:orig.label}) 
MERGE(target:concept{name:tar.label}) 
path = shortestPath((origin)-[*1..3]-(target))) RETURN path ORDER BY length(path) ASC LIMIT 3

This query doesn't work, but idea is to be able to use the foreach loop, to take first origin label and try to find the 1st,2nd,3rd shortest path to one of the targets, then 2nd origin, 3rd origin, etc. I would appreciate, if you could point me, how can I use the foreach loop in connection to shortpath finding in the proper way. Thank you in advance!


Solution

  • You can't use FOREACH in this way.

    Probably your best bet will be to use APOC path expander procs, these can use a bfs expansion for shortest paths, and can be limited in the number of results per call (and a call will be executed for each row pairing of origin and target), so with the right procedure (using spanningTree() here to ensure we visit each node exactly once and we return paths at the end) you'll get up to the 3 shortest paths per origin/target combination.

    It will also help to handle origins and targets separately, as your current approach creates a cartesian product between them, which seems to be what you want for finding paths, but it leads to wasted redundant match operations when finding the origin and target nodes.

    CALL apoc.load.json("file:///D:/project/neo_proj/input.json") YIELD value 
    UNWIND value.origin AS orig 
    MATCH(origin:concept{name:orig.label}) 
    WITH value, collect(origin) as origins
    UNWIND value.target AS tar 
    MATCH(target:concept{name:tar.label})
    UNWIND origins as origin
    WITH origin, tar
    CALL apoc.path.spanningTree(origin, {terminatorNodes:[tar], maxLevel:3, limit:3}) YIELD path
    RETURN origin, tar, length(path) as pathLength, path