Search code examples
neo4jcyphergraph-databases

Performance during scoring complex match in Neo4j graph database?


I have Neo4j 3.3.5 graph database: 27GB, 50kk nodes, 500kk relations. Indexes on. Schema. PC: 16GB ram, 4 cores.

Task is to find best matching companies for given query data. Nodes :Company, which I need to get, have multiple relations with nodes:Branch's, :Country's, etc. Query data has BranchIds, CountryIds, etc.

Currently I am using cypher like this to get score from one relations (500k rows in result):

MATCH (c:Company)-[r:HAS_BRANCH]->(b:Branch)
WHERE b.branchId in [27444, 1692, 23409, ...] //around 10 ids per query
RETURN 
c.companyId as Id, 
case r.branchType 
 when 0 then 25
 ... // //around 7 conditions per query 
 when 10 then 20 
end as Score

I have to score like this all relations types for :Company, group by Id, sum Score, order and take top 100 results.

Because of lack of post union processing, I am using collect + unwind in order to merge scores from all relations.

Unfortunately, the performance is low. I get response for query of one relations (like above) in 5-10 seconds. When I am trying to combine results with collect + unwind, the query "never" ends.

What is better/proper way to do it? Maybe I am doing something wrong with graph design? Hardware configuration to low? Or maybe there are some algorithms to match with score graph (query data) in graph database?

UPDATE

Query explanation:

User can search in our system for companies. For his query we prepare query data contains ids of branches, countries, words, etc. In result of query we want to get list of best matching companies ids with score.

E.g. user can search for new companies producing wooden tables from Spain.

Combined query example:

MATCH (c:Company)-[r:HAS_BRANCH]->(b:Branch)
WHERE b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] 
WITH case r.branchType 
when "0" then collect({id:c.companyId, score: 25}) 
 when "1" then collect({id:c.companyId, score: 19}) 
 when "2" then collect({id:c.companyId, score: 20}) 
 when "3" then collect({id:c.companyId, score: 19}) 
 when "4" then collect({id:c.companyId, score: 20}) 
 when "5" then collect({id:c.companyId, score: 15}) 
 when "6" then collect({id:c.companyId, score: 6}) 
 when "7" then collect({id:c.companyId, score: 5}) 
 when "8" then collect({id:c.companyId, score: 4}) 
 when "9" then collect({id:c.companyId, score: 4}) 
 when "10" then collect({id:c.companyId, score: 20}) 
end as rows
MATCH (c:Company)-[r:HAS_REVERTED_BRANCH]->(b:Branch)
WHERE b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] 
WITH rows + case r.branchType 
when "0" then collect({id:c.companyId, score: 25}) 
 when "1" then collect({id:c.companyId, score: 19}) 
 when "2" then collect({id:c.companyId, score: 20}) 
 when "3" then collect({id:c.companyId, score: 19}) 
 when "10" then collect({id:c.companyId, score: 20}) 
end as rows
MATCH (c:Company)-[r:HAS_COUNTRY]->(cou:Country)
WHERE cou.countryId in ["9580" , "18551" , "15895"] 
WITH rows + case r.branchType 
when "0" then collect({id:c.companyId, score: 30}) 
 when "2" then collect({id:c.companyId, score: 15}) 
 end as rows
... //here I would add in future other relations scoring
UNWIND rows AS row
RETURN row.id AS Id, sum(row.score) AS Score
ORDER BY Score DESC
LIMIT 100

Solution

  • You can try this query to see if it's better or not :

    MATCH (c:Company) WITH c
    OPTIONAL MATCH (c)-[r1:HAS_BRANCH]->(b:Branch) WHERE b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] 
    OPTIONAL MATCH (c)-[r2:HAS_REVERTED_BRANCH]->(c:Branch) WHERE c.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] 
    OPTIONAL MATCH (c)-[r3:HAS_COUNTRY]->(cou:Country) WHERE cou.countryId in ["9580" , "18551" , "15895"] 
    WITH c, 
        case r1.branchType 
          when "0" then 25
          when "1" then 19 
          when "2" then 20 
          when "3" then 19 
          when "4" then 20 
          when "5" then 15 
          when "6" then 6 
          when "7" then 5 
          when "8" then 4 
          when "9" then 4 
          when "10" then 20 
        end as branchScore,
        case r2.branchType 
          when "0" then  25 
          when "1" then  19 
          when "2" then  20 
          when "3" then  19 
          when "10" then  20 
        end as revertedBranchScore,
        case r3.branchType 
          when "0" then  30
          when "2" then  15 
        end as countryScore
    
    WITH c.id AS Id, branchScore + revertedBranchScore + countryScore AS Score
    RETURN Id, sum(Score) AS Score
    ORDER BY Score DESC
    LIMIT 100
    

    Or a better one will be this one (but only if a Company node is mandatory linked to to a Country and Branch) :

    MATCH 
      (c:Company)-[r1:HAS_BRANCH]->(b:Branch),
      (c)-[r2:HAS_REVERTED_BRANCH]->(c:Branch),
      (c)-[r3:HAS_COUNTRY]->(cou:Country)
    WHERE 
      b.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] AND 
      c.branchId in ["27444" , "1692" , "23409" , "8744" , "9192" , "26591" , "21396" , "27151" , "20228" , "3517" , "25058" , "29549"] AND
      cou.countryId in ["9580" , "18551" , "15895"]
    WITH c, 
        case r1.branchType 
          when "0" then 25
          when "1" then 19 
          when "2" then 20 
          when "3" then 19 
          when "4" then 20 
          when "5" then 15 
          when "6" then 6 
          when "7" then 5 
          when "8" then 4 
          when "9" then 4 
          when "10" then 20 
        end as branchScore,
        case r2.branchType 
          when "0" then  25 
          when "1" then  19 
          when "2" then  20 
          when "3" then  19 
          when "10" then  20 
        end as revertedBranchScore,
        case r3.branchType 
          when "0" then  30
          when "2" then  15 
        end as countryScore
    
    WITH c.id AS Id, branchScore + revertedBranchScore + countryScore AS Score
    RETURN Id, sum(Score) AS Score
    ORDER BY Score DESC
    LIMIT 100