Optimization of mgo for pathfinding

I have implemented an A* algorithm in Go to find the path between two coordinates on a map. The map data is fetched with mgo from a MongoDB collection.

It is however very slow. It sits around 4 seconds for a 1000 meter route. I have timed the different parts of the algorithm and concluded that the bottle neck is in the fetching from the database. Or to be precise: in the conversion from binary data to a data structure that Go understands.

I try to do as few requests as possible, and multi-thread the requests to make it faster, and it helps to a certain extent. But it's not nearly as fast as I wish it was.

It seems like I'm doing something fundamentally wrong. Any suggestions at all would be helpful.

Data structure in mongoDB: (Nodes fetched from OSM)

{ "_id" : NumberLong(194637483),
  "lat" : 55.7079899, 
  "lon" : 13.3756414,
  "neighbours" : [ NumberLong(1566264689), NumberLong(1566264806) ] 
}

Data structure in Go

type Node struct {
    ID         int64   `bson:"_id" json:"id"`
    Lat        float64 `bson:"lat" json:"lat"`
    Lon        float64 `bson:"lon" json:"lon"`
    Neighbours []int64 `bson:"neighbours" json:"neighbours"`
}

Code for getting a piece of data:

func addToBufferLong(buffer *WriteLockMap, session *mgo.Session, at, goal geo.Node, lowerLat, higherLat, lowerLon, higherLon float64) {
    c := session.DB("collectionName").C("nodes")
    query := c.Find(bson.M{"lat": bson.M{"$gt": lowerLat, "$lt": higherLat}, "lon": bson.M{"$gt": lowerLon, "$lt": higherLon}})

    var nodes []geo.Node
    query.All(&nodes)
    for _, node := range nodes {
        tmp := PathNode{0, node.DistanceTo(goal), 0, node}
        buffer.Lock()
        buffer.m[node.ID] = tmp
        buffer.Unlock()
    }
}

Edit:

The multi-threading strategy is based on splitting the area I wish to query into 4 different squares, quadrants if you will, and doing them separately with addToBufferLong(...)

Most recent print-outs:

> time GOMAXPROCS=8 ./toystar
Starting A star
Thread A, count: 19657, db-fetch: 0.122792104s, parsing: 0.934650055s
Thread B, count: 19574, db-fetch: 0.274384302s, parsing: 1.196350664s
Thread C, count: 4197, db-fetch: 0.388197823s, parsing: 0.930109241s
Thread D, count: 9900, db-fetch: 0.540008325s, parsing: 0.93963728s

Total database fetch:  1.534268099 s
Total A star (with fetches):  1.854748244

real    0m1.907s

where db-fetch measures the time it takes to do the row that starts with query := c.Find(...) and parsing measures the time it takes to do the query.All(&nodes)

Edit 2:

I've managed to drop the execution time significantly, with the help of fellow stack overflow users. Current print-outs:

> time GOMAXPROCS=8 ./toystar

Starting A star
Thread A: 0.210783141s
Thread B: 0.380938949s
Thread C: 0.441447793s
Thread D: 0.507361847s

Total database fetch:  0.507543875 s
number of fetches:  1
Total A star:  0.826343287s

real    0m0.860s

The main difference being the multi-threading strategy and using *mgo.Iter instead of query.All(&nodes)

Solution

From the information available, this is what I can deduce:

Your machine seems to be slow, maybe an embedded device?

The stage you called db-fetch doesn't actually access the database. The only thing c.Find(...) does is building a *mgo.Query value. The method is 6 lines long. This shouldn't take more than a millisecond. Unless there's contention on the database object's internal session state, which doesn't seem to be the case because you're using only 4 goroutines.

Bottleneck seems to be network and/or reflection

query.All(&nodes) is where the query is actually executed on your database. In addition, this phase allocates the slice of nodes it needs and then iteratively decodes bson into your struct definition via reflection.

One thing you could try: *mgo.iter

Instead of query.All(...) you could use query.Iter() to obtain a *mgo.Iter and iterate batch-wise over your data sets. This might improve performance by better distributing network load over time.

Use geospacial indexing, Mongo supports it

See the documentation. Maybe you're already doing this. If not, it might improve lookup time.

Split up your quadrants into subquadrants, recursively.

I think this one is obvious. Divide and conquer, right? :)