Before going further, here is a representation of my data model. I am stuck for the moment with Neo4J 1.9.2 and have a rather big database (~1 Million Nodes as far as I can tell, maybe less but will be growing over time when all data are ingested). Now that you have it in mind, lets explain what I mean by faceted search.
My items (documentaryUnit) are sometime linked to keywords (which can have different types). What I want to implement is a way to select few keywords and see if there is any node matching the requirements of being connected to keyword1, keyword2, etc.. I don't want to do what faceted search is mainly about, aka. showing number of possibilities and make it unable to query if there is 0 results, matching other possibilities. I just want to be able to do this "simple" query. Keep in mind I am quite new in the Neo4J world, tried to find an answer before but as I am lacking some conceptual things, might have missed the right post.
So, here is the query I tried :
START
facet1 = node:entities("__ID__:keyword-104"),
facet2 = node:entities("__ID__:place-1"),
facet3 = node:entities("__ID__:keyword-2"),
facet4 = node:entities("__ID__:keyword-258")
MATCH
(elem)<-[:hasLinkTarget]-(link)-[:hasLinkTarget]->(facet1),
(elem)<-[:hasLinkTarget]-(link)-[:hasLinkTarget]->(facet2),
(elem)<-[:hasLinkTarget]-(link)-[:hasLinkTarget]->(facet3),
(elem)<-[:hasLinkTarget]-(link)-[:hasLinkTarget]->(facet4)
WITH distinct elem, facet1, facet2, facet3, facet4, link
RETURN elem
With or without distinct, it takes ages and basically crash sometimes. With only two keywords, it works well ( < 100 ms). 3 is long, 4 crashes (more or less). I need to find a way to do it without using any external services (solr is not an option here for upgrading reasons).
Given the picture I attached, what I want is to find documentaryUnit like #1, attached to keyword 1,4,5,3 through a link. I tried with collection as well, doing so :
START doc = node:entities("__ISA__:documentaryUnit")
MATCH (doc)<-[:hasLinkTarget]-(link)-[:hasLinkTarget]->(accessPoints)
WITH collect(accessPoints.__ID__) AS accessPointsId, doc
WHERE ALL (x IN ['keyword-104', 'place-1', 'keyword-2']
WHERE x IN accessPointsId)
RETURN doc.__ID__
which does not crash but takes a lot of basenode as a start entry points. Takes between 1000 ms and 2000 ms.
Thank you for reading this, will reply as soon as possible when you post something
Two solutions. The best one (around 500ms for caching, 270 ms afterwards) :
START
accessPoints = node:entities("__ID__:kw-1 OR __ID__:kw-2 OR __ID__:kw-3 OR __ID__:kw-4")
MATCH
(doc)<-[:hasLinkTarget]-(link)-[:hasLinkTarget]->accessPoints
WHERE doc.__ISA__ = "documentaryUnit"
WITH collect(accessPoints.__ID__) AS accessPointsId, doc
WHERE ALL (x IN ['kw-1', 'kw-2', 'kw-3', 'kw-4']
WHERE x IN accessPointsId)
RETURN doc
The second one 5000ms and 400 ms afterwards
START
facet1 = node:entities("__ID__:kw-1"),
facet2 = node:entities("__ID__:kw-2"),
facet3 = node:entities("__ID__:kw-3"),
facet4 = node:entities("__ID__:kw-4")
MATCH
(elem)<-[:hasLinkTarget]-()-[:hasLinkTarget]->facet1,
(elem)<-[:hasLinkTarget]-()-[:hasLinkTarget]->facet2,
(elem)<-[:hasLinkTarget]-()-[:hasLinkTarget]->facet3,
(elem)<-[:hasLinkTarget]-()-[:hasLinkTarget]->facet4
WHERE elem.__ISA__ = "documentaryUnit"
RETURN elem
Removing the parenthis gave me a way faster response.