database postgresql performance amazon-ec2 benchmarking

How could database have worse benchmark results on faster disk?

I'm benchmarking comparable (2vCPU, 2G RAM) server (Ubuntu 18.04) from DigitalOcean (DO) and AWS EC2 (t3a.small).

The disk benchmark (fio) goes inline with the results of https://dzone.com/articles/iops-benchmarking-disk-io-aws-vs-digitalocean

In summary:

DO --

READ: bw=218MiB/s (229MB/s), 218MiB/s-218MiB/s (229MB/s-229MB/s), io=3070MiB (3219MB), run=14060-14060msec

WRITE: bw=72.0MiB/s (76.5MB/s), 72.0MiB/s-72.0MiB/s (76.5MB/s-76.5MB/s), io=1026MiB (1076MB), run=14060-14060msec

EC2 --

READ: bw=9015KiB/s (9232kB/s), 9015KiB/s-9015KiB/s (9232kB/s-9232kB/s), io=3070MiB (3219MB), run=348703-348703msec

WRITE: bw=3013KiB/s (3085kB/s), 3013KiB/s-3013KiB/s (3085kB/s-3085kB/s), io=1026MiB (1076MB), run=348703-348703msec

which shows DO disk more than 10 times faster than the EBS of EC2

However, sysbench following https://severalnines.com/database-blog/how-benchmark-postgresql-performance-using-sysbench is showing DO slower than EC2 (using Postgres 11 default configuration, read-write test on oltp_legacy/oltp.lua )

DO --

transactions: 14704 (243.87 per sec.)

Latency (ms):

min: 9.06

avg: 261.77

max: 2114.04

95th percentile: 383.33

EC2 --

transactions: 20298 (336.91 per sec.)

Latency (ms):

min: 5.85

avg: 189.47

max: 961.27

95th percentile: 215.44

What could be the explanation?

Solution

Sequential read/write throughput matters for large sequential scans, stuff like data warehousing, loading a large backup, etc.

Your benchmark is OLTP which does lots of small quick queries. For this sequential throughput is irrelevant.

For reads (SELECTs) the most important factor is having enough RAM to keep your working set in cache and not do any actual IO. Failing that, it is read random access time.

For writes (UPDATE,INSERT) then the fsync latency, which is the time required to commit data to stable storage, is the most important factor since the database will only finish a COMMIT when data has been written.

Most likely the EC2 has better random access and fsync performance. Maybe it uses SSDs or battery-backed cache.