Search code examples
amazon-web-servicesamazon-ec2

EC2 instance storage performance degradation


Is it normal for EC2 instance storage disks to perform worse and worse over a few days? do they have IOPS quotas?

I launched a C5D.large instance a few days ago. Its job is to download a ZIP with 500 x 1mb text files (unzipped size), unzip them to disk and zip into 500 individual zips.

TLDR: it reads and writes files using the instance storage.

In synthetic tests it looked fishy:

dd if=/dev/zero of=/mnt/testfile bs=1G count=1

Day 1: 500MB/s 
Day 2: 120MB/s
Day 3 - 4: 40MB/s

Wasn't really an issue, because while working it had ~0 iowait and it was maxing out the CPU anyway:

enter image description here

The problem is that after 2 days of running it went to this:

enter image description here

The curious thing is that DD and HDPARM look okay?

# sudo hdparm -Tt /dev/nvme1n1

/dev/nvme1n1:
 Timing cached reads:   15170 MB in  1.99 seconds = 7612.25 MB/sec
 Timing buffered disk reads: 392 MB in  3.01 seconds = 130.24 MB/sec

# dd if=/dev/zero of=/mnt/testfile bs=1G count=1
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB) copied, 29.9148 s, 35.9 MB/s

Later edit: this is also happening on an EBS volume, with the difference that it doesn't recover after rebooting:

enter image description here

enter image description here

With an EBS, after launching or modifying the volume, it works great for a few hours, then the performance drops dramatically for the same workload. Then it even struggles to sustain 50% of the initial throughput, to the point where I can barely log in and SU as root.


Solution

  • Looks like every EC2 instance type is capped at a certain bandwidth/IOPS when working with disks: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-optimized.html.