Mongodb queries on routines produces huge stack trace

I am doing lots of web calls to an api in a go program, and the results are stored in a database (using mgo). The api calls are done on separate go routines. On other routines, I am pulling the information out of the database and processing it, before updating the database. When the data is put back, a flag is set so that it is known that that data has been post processed so when the program asks the database for another entry to post process the database hands back one where the flag complete is set to false. When the flag is set to true, the go routine is shutdown: wg.done().

All is good, I have lots of print outs telling me how the program is getting on, however towards the end of its running I get a huge stack trace containing lots of the same:

goroutine 56731 [sleep]: time.Sleep(0x12a05f200) /usr/local/Cellar/go/1.5/libexec/src/runtime/time.go:59 +0xf9 gopkg.in/mgo%2ev2.(*mongoServer).pinger(0xc82601b420, 0x1) /Users/alex/go/src/gopkg.in/mgo.v2/server.go:295 +0x1b4 created by gopkg.in/mgo%2ev2.newServer /Users/alex/go/src/gopkg.in/mgo.v2/server.go:88 +0x162

goroutine 56698 [sleep]: time.Sleep(0x12a05f200) /usr/local/Cellar/go/1.5/libexec/src/runtime/time.go:59 +0xf9 gopkg.in/mgo%2ev2.(*mongoServer).pinger(0xc82601bce0, 0x1) /Users/alex/go/src/gopkg.in/mgo.v2/server.go:295 +0x1b4 created by gopkg.in/mgo%2ev2.newServer /Users/alex/go/src/gopkg.in/mgo.v2/server.go:88 +0x162

goroutine 56699 [sleep]: time.Sleep(0x1dcd6500) /usr/local/Cellar/go/1.5/libexec/src/runtime/time.go:59 +0xf9 gopkg.in/mgo%2ev2.(*mongoCluster).syncServersLoop(0xc8256425a0) /Users/alex/go/src/gopkg.in/mgo.v2/cluster.go:353 +0x2b1 created by gopkg.in/mgo%2ev2.newCluster /Users/alex/go/src/gopkg.in/mgo.v2/cluster.go:73 +0x1a0

goroutine 56738 [sleep]: time.Sleep(0x12a05f200) /usr/local/Cellar/go/1.5/libexec/src/runtime/time.go:59 +0xf9 gopkg.in/mgo%2ev2.(*mongoServer).pinger(0xc82606fa40, 0x1) /Users/alex/go/src/gopkg.in/mgo.v2/server.go:295 +0x1b4 created by gopkg.in/mgo%2ev2.newServer /Users/alex/go/src/gopkg.in/mgo.v2/server.go:88 +0x162

There is one thing amongst all of those that is below, which is the only different output on the stack trace to the above (the above is just a sample, my terminal can't scroll back to the beginning there is so many)

goroutine 57201 [IO wait]: net.runtime_pollWait(0xedb6f0, 0x72, 0xc82000a2c0) /usr/local/Cellar/go/1.5/libexec/src/runtime/netpoll.go:157 +0x60 net.(*pollDesc).Wait(0xc827b0e5a0, 0x72, 0x0, 0x0) /usr/local/Cellar/go/1.5/libexec/src/net/fd_poll_runtime.go:73 +0x3a net.(*pollDesc).WaitRead(0xc827b0e5a0, 0x0, 0x0) /usr/local/Cellar/go/1.5/libexec/src/net/fd_poll_runtime.go:78 +0x36 net.(*netFD).Read(0xc827b0e540, 0xc828d61000, 0x1000, 0x1000, 0x0, 0x754050, 0xc82000a2c0) /usr/local/Cellar/go/1.5/libexec/src/net/fd_unix.go:232 +0x23a net.(*conn).Read(0xc8260eac38, 0xc828d61000, 0x1000, 0x1000, 0x0, 0x0, 0x0) /usr/local/Cellar/go/1.5/libexec/src/net/net.go:172 +0xe4 net/http.noteEOFReader.Read(0x7960c0, 0xc8260eac38, 0xc82751fd38, 0xc828d61000, 0x1000, 0x1000, 0xc82644dc20, 0x0, 0x0) /usr/local/Cellar/go/1.5/libexec/src/net/http/transport.go:1370 +0x67 net/http.(*noteEOFReader).Read(0xc826116e60, 0xc828d61000, 0x1000, 0x1000, 0xc827d1a770, 0x0, 0x0) :126 +0xd0 bufio.(*Reader).fill(0xc82644d4a0) /usr/local/Cellar/go/1.5/libexec/src/bufio/bufio.go:97 +0x1e9 bufio.(*Reader).Peek(0xc82644d4a0, 0x1, 0x0, 0x0, 0x0, 0x0, 0x0) /usr/local/Cellar/go/1.5/libexec/src/bufio/bufio.go:132 +0xcc net/http.(*persistConn).readLoop(0xc82751fce0) /usr/local/Cellar/go/1.5/libexec/src/net/http/transport.go:876 +0xf7 created by net/http.(*Transport).dialConn /usr/local/Cellar/go/1.5/libexec/src/net/http/transport.go:685 +0xc78

I am struggling to work out what it is telling me, whether its locking on writing to the database, whether the routines aren't closing and something is timing out, I don't know. I am using go 1.5 btw.

The code that talks with the database is below:

func (l *Ledger) addRaceToDatabase(race Race) { //true if added, 

false if existed
  session, err := mgo.Dial("127.0.0.1")
  if err != nil {
    panic(err)
  }

  defer session.Close()

  session.SetMode(mgo.Monotonic, true)

  c := session.DB("collection").C("races")

    // Index
  index := mgo.Index{
    Key:        []string{"id"},
    Unique:     true,
    DropDups:   true,
    Background: true,
    Sparse:     true,
  }

  err = c.EnsureIndex(index)
  if err != nil {
    panic(err)
  }

  result := Race{}
  //if the race exists, don't add it to the database
  err = c.Find(bson.M{"id": race.ID}).One(&result)
  if err != nil {
    //if there is an error there wasn't an entry so add this to the database
    err = c.Insert(race)
    if err != nil {
        panic(err)
    }   
  } else {
    //if it does find an entry, it will print it
    fmt.Println("FOUND: ", result.ID)
  }
}

Solution

It looks like the quoted logic is dialing MongoDB every time it needs to do something with it, which isn't the appropriate way to hold sustained database communication within an HTTP server.

Every time you Dial a MongoDB server (or replica set, or mongos), some background activity is started to keep the cluster topology up-to-date in memory, and to maintain the pool of connections sane (note that all sessions that are created after a single Dial call will share a single pool of connections).

Calling Dial every time one wants to talk to the database means all of that logic becomes independent for each of the dials, and that background activity may take a moment to shut down after the last session is closed.

So even though it will work, it's very inefficient as it has to learn again about the topology in use, it fails to use connections from an existing pool, and it also creates unnecessary background churn which is what you are observing in your backtrace.

For some ideas on how to better maintain a session in this context, please see this topic:

Best practice to maintain a mgo session