I am trying to query approx 10,000 rows from a table. After trying various other options involving limit
offset
and not finding desired success, I am trying to query single row in each goroutine. Thought processes being that each row only takes ~5ms
for query and fetch but a batch of 10k takes over 20s
.
Shown below is simplified version of the code:
func queryEmp(IDs[]string, spannerClient *spanner.Client) (Employee,error){
query := "Select name from Employee Where id = @id"
g, gCtx := errgroup.WithContext(ctx)
for _, ID := range IDs {
id := ID
g.Go(func() error {
tx := spannerClient.Single()
defer tx.Close()
stmt2 := spanner.NewStatement(query)
stmt2.Params = map[string]interface{}{
"ID": id,
}
qstart := time.Now()
it := tx.Query(gCtx, stmt2)
defer it.Stop()
logrus.Debugf("%s took %v \n", "query execution.", time.Since(qstart))
for {
row, err := it.Next()
if err == iterator.Done {
break
}
if err != nil {
return err
}
var eID string
if err := row.Column(0, &eID); err != nil {
return err
}
}
return nil
})
}
err = g.Wait()
}
The resulting trace is as below:
{"message":"query execution. took 39.677µs \n","severity":"debug","time":"2023-11-03T20:51:29-04:00"}
{"message":"query execution. took 34.125µs \n","severity":"debug","time":"2023-11-03T20:51:29-04:00"}
{"message":"query execution. took 26.634µs \n","severity":"debug","time":"2023-11-03T20:51:29-04:00"}
{"message":"query execution. took 29.303µs \n","severity":"debug","time":"2023-11-03T20:51:29-04:00"}
...
...
...
{"message":"query execution. took 188.749562ms \n","severity":"debug","time":"2023-11-03T20:51:29-04:00"}
{"message":"query execution. took 276.424692ms \n","severity":"debug","time":"2023-11-03T20:51:29-04:00"}
{"message":"query execution. took 188.62849ms \n","severity":"debug","time":"2023-11-03T20:51:29-04:00"}
{"message":"query execution. took 217.067524ms \n","severity":"debug","time":"2023-11-03T20:51:29-04:00"}
{"message":"query execution. took 276.949166ms \n","severity":"debug","time":"2023-11-03T20:51:29-04:00"}
...
...
...
{"message":"query execution. took 454.64281ms \n","severity":"debug","time":"2023-11-03T20:51:29-04:00"}
{"message":"query execution. took 452.0848ms \n","severity":"debug","time":"2023-11-03T20:51:29-04:00"}
{"message":"query execution. took 525.748738ms \n","severity":"debug","time":"2023-11-03T20:51:29-04:00"}
{"message":"query execution. took 454.704656ms \n","severity":"debug","time":"2023-11-03T20:51:29-04:00"}
{"message":"query execution. took 455.4276ms \n","severity":"debug","time":"2023-11-03T20:51:29-04:00"}
...
...
...
{"message":"query execution. took 6.767574136s \n","severity":"debug","time":"2023-11-03T20:52:00-04:00"}
{"message":"query execution. took 6.780578444s \n","severity":"debug","time":"2023-11-03T20:52:00-04:00"}
{"message":"query execution. took 6.785085491s \n","severity":"debug","time":"2023-11-03T20:52:00-04:00"}
{"message":"query execution. took 6.779527006s \n","severity":"debug","time":"2023-11-03T20:52:00-04:00"}
It starts out great and as expected but the query time keeps increasing.
MaxSessions
and MinSessions
for spannerClient
are 100
so one would imagine that it may see slight slowdown after 100 but that isn't the case.
Read here that:
Sessions can execute only one transaction at a time. Standalone reads, writes, and queries use a transaction internally, and count toward the one transaction limit.
Non iterative query (ReadRow
etc) gives me same result.
Having tx := spannerClient.Single()
outside of for loop also gives similar result.
Question(s):
spannerClient.Single()
in the goroutine?TLDR: The default max session pool size is 400, meaning that you will never have more than 400 queries running in parallel. You would need to increase the session pool size to achieve this kind of concurrency.
First of all: I do not think that sending 10,000 queries in parallel to let each read a single row is going to be the most efficient solution to your problem. If there is no other condition that you can use to filter than the employee ID, and those IDs are scattered all over the place, then it would still be more efficient to create a query in the form
select * from employees where id in unnest(@ids)
See this comment for a complete example: https://github.com/googleapis/google-cloud-go/issues/858#issuecomment-550982307
Coming back to your concrete question:
it := tx.Query(gCtx, stmt2)
lines does not execute the query, it only prepares the query for execution. The execution happens the first time that you call row, err := it.Next()
. You can also see this in the execution times that are being logged. The first statements seem to be executed in 30 microseconds, which is impossible.