Search code examples
performancehardwarecapacity-planning

Estimation of commodity hardware for an application


Suppose, I wanted to develop stack overflow website. How do I estimate the amount of commodity hardware required to support this website assuming 1 million requests per day. Are there any case studies that explains the performance improvements possible in this situation?

I know I/O bottleneck is the major bottleneck in most systems. What are the possible options to improve I/O performance? Few of them I know are

  1. caching
  2. replication

Solution

  • You can improve I/O performance in several ways depending upon what you use for your storage setup:

    1. Increase filesystem block size if your app displays good spatial locality in its I/Os or uses large files.
    2. Use RAID 10 (striping + mirroring) for performance + redundancy (disk failure protection).
    3. Use fast disks (Performance Wise: SSD > FC > SATA).
    4. Segregate workloads at different times of day. e.g. Backup during night, normal app I/O during day.
    5. Turn off atime updates in your filesystem.
    6. Cache NFS file handles a.k.a. Haystack (Facebook), if storing data on NFS server.
    7. Combine small files into larger chunks, a.k.a BigTable, HBase.
    8. Avoid very large directories i.e. lots of files in the same directory (instead divide files between different directories in a hierarchy).
    9. Use a clustered storage system (yeah not exactly commodity hardware).
    10. Optimize/design your application for sequential disk accesses whenever possible.
    11. Use memcached. :)

    You may want to look at "Lessons Learned" section of StackOverflow Architecture.