Search code examples
amazon-web-servicesamazon-ec2webserveramazon-ebsprovisioned-iops

AWS : Splitting software & data in different volumes


AWS recommends keeping data & OS on separate EBS volumes. I have a webserver running on EC2 with an EBS volume. On a bare VM, I install the following:

- webserver, wsgi, pip & related software/config (some in /etc some in /home/<user>)
- server code & static assets in /var/www/
- log files are written to /var/log/<respective-folder>
- maintenance scripts in /home/<user>/

Database server is separate. For a webserver, which of the above items would benefit from higher IOPS and for which ones it doesn't matter ? My understanding is that the server code & log files should be moved to a separate EBS volume with higher IOPS. Or should I just move all of my stuff (except the softwares I installed in /etc i.e. webserver) to a separate volume with better IOPS ?


Solution

  • I would recommend that you have a separate EBS volume for code, logs, and maintenance in the case that you need to move it to another server. That allows you a faster TTR (time to resolution), than having to build an entire server.

    The code shouldn't be changing largely past deployment, so I would focus on a general purpose SSD here, and look towards a caching layer (Varnish (full page caching) & CDN (static assets) ) more than having disk I/O issues. A CDN is a quick win and mitigates most the I/O for reading static assets. At 50GB, you get 150 IOPS, and with the mitigate of the static assets; the I/O should be fine.

    As for logs, if you are a high traffic site, then you should definitely focus on I/O here as you don't want have blocking I/O here. This is mainly focusing on access logs more than error logs, as those shouldn't be past ERROR level on a production systems. If you aren't high traffic, then you should be fine with general purpose SSD, at 10GB, you get 30 IOPS, and that is generally enough.

    What are your maintenance scripts doing? If they are generating and outputting files, then you could use SSD, but if you need high I/O, you should revisit the code and optimize the code as these disks can get expensive, and that cost is usually wasteful for maintenance that runs intermittently.

    As for your web-server, et cetera, that should be based on infrastructure as code, via OpsWorks or Puppet, and doesn't need much in the term of I/O as those are usually memory-based processes once built and deployed.