Search code examples
neo4jlucenefull-text-searchpy2neoneo4jrestclient

Lucene full-text index: all indexed nodes with same score?


I have been trying solving this issue since days.

I want to do a START query against full-text, ordered by relevance, so to paginate results.

Gladly, I finally found this thread on full-text indexing and neo (and using python as driver).

[https://groups.google.com/forum/#!topic/neo4j/9G8fcjVuuLw]

I had imported my db with batch super-importer, and got a reply of @Michaelhunger who kindly noticed there was a bug, all scores would had been imported the same value.

So, now I am recreating the index, and checking the score via REST (&order=score)

http://localhost:7474/db/data/index/node/myInde?query=name:myKeyWord&order=score

and noticed that entries have still the same score.

(You've got to do an ajax query to see it cause if you use the web console you won't see all data!!)

My code to recreate a full-text lucene index, having each node property 'name': (here using neo4j-rest-client, but I will try also with py2neo as in the Google discussion):

from neo4jrestclient.client import GraphDatabase
gdb = GraphDatabase("http://localhost:7474/db/data/")

myIndex =  gdb.nodes.indexes.create("myIndex", type="fulltext", provider="lucene")

myIndex.add("name",node.get("name"),node)

results:

http://localhost:7474/db/data/index/node/myInde?query=name:DNA&order=score

data Object {id: 17062920, name: "DNA damage theory of aging"}
VM995:10 **score 11.097855567932129**
...
data Object {id: 17022698, name: "DNA (film)"}
VM995:10 **score 11.097855567932129**

In the documentation: [http://neo4j.com/docs/stable/indexing-lucene-extras.html#indexing-lucene-sort] it is written that Lucene does the sorting itself very well, so I understood it creates a ranking by itself in import; it does not.

What am I doing wrong or missing?


Solution

  • The goal of my question is to obtain a list of results ordered by relevance of nodes' names matching the queried keywords.

    @mfkilgore point out this work-around:

    start n=node:topic('name:(keyword1* AND keyword2*)') MATCH (n)  with n order by length(split(n.name," ")) asc limit 20 return n
    

    This workaround counts the chars in a node's name, and then order by length of string.