Search code examples
performanceneo4jquery-optimizationcypherslowdown

Neo4j: why the performance of allShortestPaths function is so slow?


I am using Neo4j 'neo4j-community-2.3.0-RC1' version. In my database there are just 1054 nodes. when i do path query with 'allShotestPaths' function, why it is so slow. it take about more than 1 second,the following is unit test result :

  √ search optimalPath Path (192ms)
  √ search optimal Path by Lat Lng (1131ms)

should i optimize the query? the following are querys for 'optimalPath' and 'optimal Path by Lat Lng'

optimalPath query:

MATCH path=allShortestPaths((start:潍坊_STATION )-[rels*..50]->(end:潍坊_STATION {name:"火车站"}))
RETURN NODES(path) AS stations,relationships(path) AS path,length(path) AS stop_count, 
length(FILTER(index IN RANGE(1, length(rels)-1) WHERE (rels[index]).bus <> (rels[index - 1]).bus)) AS transfer_count, 
length(FILTER( rel IN rels WHERE type(rel)="WALK"  )) AS walk_count
order by transfer_count,walk_count,stop_count

optimal Path by Lat Lng query:

MATCH path=allShortestPaths((start:潍坊_STATION {name:"公交总公司"})-[rels*..50]->(end:潍坊_STATION {name:"火车站"}))
 WHERE
round(
6378.137 *1000*2*
asin(sqrt(
     sin((radians(start.lat)-radians(36.714))/2)^2+cos(radians(start.lat))*cos(radians(36.714))*
     sin((radians(start.lng)-radians(119.1268))/2)^2
 ))
)/1000 < 0.5      // this formula is used to calculate the distance between two GEO  coordinate (latitude\longitude)
RETURN NODES(path) AS stations,relationships(path) AS path,length(path) AS stop_count, 
length(FILTER(index IN RANGE(1, length(rels)-1) WHERE (rels[index]).bus <> (rels[index - 1]).bus)) AS transfer_count, 
length(FILTER( rel IN rels WHERE type(rel)="WALK"  )) AS walk_count
order by transfer_count,walk_count,stop_count

you can download the database here:https://www.dropbox.com/s/zamkyh2aaw3voe6/data.rar?dl=0

i will be very grateful ,if anybody can help me. thanks


Solution

  • In general, without knowing more, I would pull the predicates and expressions that can be computed before the paths are all expanded, before the match.

    And as your geo-filter is independent of anything else except your parameters and the start-node you can do:

    MATCH (start:潍坊_STATION {name:"公交总公司"})
    
    WHERE 
    // this formula is used to calculate the distance between two GEO  coordinate (latitude\longitude)
    round(6378.137 *1000*2* 
          asin(sqrt(sin((radians(start.lat)-radians({lat}))/2)^2
          +cos(radians(start.lat))*cos(radians({lat}))*
           sin((radians(start.lng)-radians({lng}))/2)^2)))/1000 
    < 0.5      
    
    MATCH (end:潍坊_STATION {name:"火车站"})
    MATCH path=allShortestPaths((start)-[rels*..50]->(end))
    RETURN NODES(path) AS stations,
           relationships(path) AS path,
           length(path) AS stop_count, 
           length(FILTER(index IN RANGE(1, length(rels)-1) 
                 WHERE (rels[index]).bus <> (rels[index - 1]).bus)) AS transfer_count,
           length(FILTER( rel IN rels WHERE type(rel)="WALK"  )) AS walk_count
    
    ORDER BY transfer_count,walk_count,stop_count;
    

    see this test (but the other query is equally fast):

    neo4j-sh (?)$ MATCH (start:潍坊_STATION {name:"公交总公司"})
    > 
    > WHERE 
    > // this formula is used to calculate the distance between two GEO  coordinate (latitude\longitude)
    > round(6378.137 *1000*2* 
    >       asin(sqrt(sin((radians(start.lat)-radians({lat}))/2)^2
    >       +cos(radians(start.lat))*cos(radians({lat}))*
    >        sin((radians(start.lng)-radians({lng}))/2)^2)))/1000 
    > < 0.5      
    > 
    > MATCH (end:潍坊_STATION {name:"火车站"})
    > MATCH path=allShortestPaths((start)-[rels*..50]->(end))
    > WITH NODES(path) AS stations,
    >        relationships(path) AS path,
    >        length(path) AS stop_count, 
    >        length(FILTER(index IN RANGE(1, length(rels)-1) 
    >              WHERE (rels[index]).bus <> (rels[index - 1]).bus)) AS transfer_count,
    >        length(FILTER( rel IN rels WHERE type(rel)="WALK"  )) AS walk_count
    > 
    > ORDER BY transfer_count,walk_count,stop_count
    > RETURN count(*);
    +----------+
    | count(*) |
    +----------+
    | 320      |
    +----------+
    1 row
    10 ms