Search code examples
neo4jgraph-algorithmsimilarity

Neo4j: Testing 'Node Similarity' on Stream mode error


I'm trying to test 'Node similarity' on a bipartite database having this form: keyword -[APPEARS_IN]-> article I would like to get a relationship 'SIMILAR' between articles with a score. I tried the following code, using the node property title:

CALL gds.nodeSimilarity.stream('test')
YIELD node1, node2, similarity
RETURN gds.util.asNode(node1).title AS Article1, gds.util.asNode(node2).title AS Article2, similarity
ORDER BY similarity DESCENDING, Article1, Article2

Here are the results:

Results

But the results are not good, I'm having 'None' everywhere... is it because of the length of the strings in 'title'? Titles in my database are sometimes very long, for example:as in here

What should I do?

I also tried to do it by 'id' propertie but the articles that have a score of '1' (the highest possible) do not seem to have a real similarity between them when I check (they are not similar at all).


Solution

  • So the way you are currently projecting the graph:

    CALL gds.graph.create('lpa_test','*', 
        {APPEARS_IN:{type: 'APPEARS_IN', orientation: 'NATURAL',                  
         properties:['weights']}})
    

    You will be comparing keywords instead of articles due to your graph schema.

    (keyword)-[:APPEARS_IN]->(article)
    

    The source node of the relationship is the item to be compared, and the target of the relationship will be considered for comparison of items. That is why both article1 and article2 columns are empty, as you probably use another property name for keywords. If you want to compare articles, you have to reverse the relationship.

    CALL gds.graph.create('lpa_reverse','*', 
        {APPEARS_IN:{type: 'APPEARS_IN', orientation: 'REVERSE',                  
         properties:['weights']}})
    

    Now you will get the results you are looking for with node similarity algorithm:

    CALL gds.nodeSimilarity.stream('lpa_reverse')
    YIELD node1, node2, similarity
    RETURN gds.util.asNode(node1).title AS Article1, 
           gds.util.asNode(node2).title AS Article2, similarity
    ORDER BY similarity DESCENDING, Article1, Article2