Single node instance reaching 100% cpu with just under 1K QPS

I am evaluating Spanner for a read-heavy use case. I am using Postgres dialect. I have a single node setup (1000 processing units) with a single table having less than a million rows. The table lookup is through the primary key is a sha256 encoded string to randomize the key distribution. In my testing,

I am running a standalone go-lang client application which is executing 100 go-routine worker threads to query the database using the primary keys (select * from foo where ID IN('...', '...')). Every query will have up to IDs to lookup in the IN clause.

With this setup, I am getting around 1K QPS with the CPU reaching 100%. Spanner advertised QPS for a single node is ~22K but I am not getting anywhere near this. The max scanned rows in this setup is close to 20K but the average scanned rows is closer to 2.5K. I even don't know why the max scanned row is 20K since I am always querying for only max 10 ID primary key. The spanner backend metrics are useful only up to this level but does not give me any indication as to why we are not able to go beyond 1K QPS limits.

I have tried multiple iterations with many variations of worker pool size, session pool (1000 max) with 10 gRPC channels and batch size but the results are more or less same.

Appreciate any inputs and guidance and share your experience.

Solution

There could be multiple reasons for why you are not seeing the throughput that you are looking for.

Client

Let's start with some base math to see what the maximum reads are that your client could handle in this setup:

A single read (select a single row using the primary key) on Spanner, assuming that you are using a single-use read-only transaction, will take around 4-5ms.
You have 100 goroutines reading data.
Each goroutine can execute up to 200 reads per second (5ms per read, so 200 of these per 1,000ms).
This means that your client in theory should be able to execute up to 20,000 reads per second. Have you verified that the baseline does indeed fit these numbers?
Put another way: Have you verified that a single read takes around 5ms when you run only one goroutine? If so, at what number of goroutines does the performance start to drop?

Query

Based on your question, it seems like you are generating a query that will contain a random list of literals. So something like select * from foo where id in ('key1', 'key2', 'key3'). This will require Spanner to parse and plan the query every time it is a executed, as the SQL string is different every time. Instead, you should use a parameterized query, and send the keys as parameter values. That will let Spanner cache the query plan, and it will only need to parse and plan the query 10 times.

So generate 10 queries looking like this: select * from foo where id in ($1, $2, $3).

Data

You write that you table contains 'less than 1 million rows'. Spanner is built for scale, and more data is actually better than less, as it will let Spanner distribute the data better. Around 1mio rows should normally be enough, but your reads need to be spread across the entire key space in order to achieve maximum throughput. Have you verified that your random key selection is really random? Could you otherwise try with a larger amount of data in your table?

Warmup / Duration

How long is your benchmark running?

Spanner distributes your data across multiple servers based on actual usage by creating splits. This is a dynamic process that takes some time. It is recommended to warm up your database before a production launch. The same is also true for a benchmark. See https://cloud.google.com/spanner/docs/pre-warm-database for more details.