Search code examples
amazon-web-servicesamazon-rdsamazon-ebs

What type of "data" does Elastic Block Storage store?


I am studying for my AWS SAA exam and while reviewing the various services I got stuck on the concept of Elastic Block Storage, what exactly it stores, and how this is different than AWS RDS. Let me elaborate.

So from what I understand, EBS is a storage service that saves the data on an EC2 instance. That is simple enough but when I got to RDS I got confused as to what type of data is being stored on the block storage and what exactly is being snapshotted and backed up. Here is an example.

I am a Ruby on Rails Developer. I purchase an EC2 instance and install ruby on rails, the various gems, a mysql database, etc. Basically everything I need to get up and running to create a social media blog app. So what I thought EBS was doing was saving the user information, the blogs, the posts, the comments, etc. in EBS as a database (either ephemeral with instance or block with ebs). This makes sense to me. The users and their blogs, posts, comments, etc. Need to be stored somewhere and if I somehow lose my ec2 instance, I shouldn't be worried because I have EBS with all that information.

Now at some point I switch to AWS RDS and now I'm wondering why do I even need EBS anymore? I have for example my mySQL rds database with its own backups and snapshots separate from the ec2 instance that stores the user data, blogs, etc. Why do I need EBS now? Do I even need it now? If I do need it, what is it storing in there that is valuable now that I have my user info saved in RDS? If anyone can help me understand the difference I'd appreciate it.

Also, as a bonus question, I understand that EC2 instances are in a public subnet in the VPC. Additionally, I understand that databases should be on the private subnet according to AWS. So if I have a linux ec2 and then run sudo apt-get mysql to install a database with ruby on rails, isn't that mysql database in the public subnet on my ec2 machine?


Solution

  • Have you ever gone done the road to Best Buy and purchased a USB external hard disk? You know... the type you'd use for backups, or to load with movies to give to your friends.

    Well, think of an Amazon EBS volume as that disk. It's an external disk, so it's not 'inside' a computer. You can unplug it from one computer and plug it into another computer. You can load any type of file onto it (Word documents, PDFs, images, videos). The disk itself is dumb — it just stored information on disk "blocks". It is the Operating System (Windows, Mac, Linux) that knows how to interpret the information so that you can list directories, create files and read content.

    This type of storage is typically known as block storage because the disk is divided into a number of blocks, and it just gets told "store this data in block #433".

    Amazon EC2 instances need a disk because Windows, Linux and Mac expect to have a disk. The operating system itself is stored on the disk. When you open and save files, they go to a disk. You can't boot an EC2 instance without a disk, because the disk contains the operating system.

    However... while you can store data on an Amazon EBS volume that is attached to an EC2 instance, it isn't necessarily a good idea. It is better if applications running on EC2 store their important data external to the instance because:

    • If the instance fails, the data is safe
    • Multiple EC2 instances can access the data (an Amazon EBS volume can only be attached to one instance at a time)
    • The application can be upgraded without impacting the data

    Therefore, applications should store their data in services like a database (Amazon RDS, Amazon DynamoDB, etc) or an object store (Amazon S3).

    Amazon RDS, however, is a program. You communicate with the program and it does stuff. In this case, the program is a relational database. A relational database uses SQL to store and query data. Data is stored in tables that look a bit like a spreadsheet — it has rows and columns and you can run queries against the data (eg give me the total of the Sales column where Country = New Zealand).

    But where does the database program itself store data? On a disk! In fact, Amazon RDS stores its data on an Amazon EBS disk volume. It is the job of the database program to tell the disk where to store data and where to read data. The database program does all the smart stuff.

    So, even if your EC2 instance stores data in Amazon RDS instead of the local disk, the data is eventually being stored on Amazon EBS disk volumes anyway! The difference is that the content of that disk is managed by the database. You do not have any direct access to that disk.


    As to your subnet question, yes... If you install a database on an Amazon EC2 instance that is in a public subnet, then the database is also in a public subnet.

    Public and Private subnets are a carry-over from the old days of physical networking. Routers were used to protect parts of a network — they would be installed between "subnets", controlling what traffic can go into and out of a subnet. This concept is carried-over to the cloud too.

    However, the cloud is even better than that type of networking. AWS has the concept of a Security Group, which is like a firewall on every EC2 instance rather than just between subnets. The security group allows you to control the type of traffic going into and out of an EC2 instance. Therefore, you could actually throw away the concept of a public subnet and private subnet and just use security groups to control network traffic. However, networking people like the old concepts and the additional level of safety they provide. Therefore, they use both security groups and public/private subnets to add additional layers of security.

    So, yes, you can put a database in a public subnet. You can protect it with Security Groups and it will be perfectly safe. But putting it in a private subnet adds another level of safety, which is generally a good idea when you're talking about protecting important data. It's just like people putting a security door on the front of their house — it's an additional layer of security that makes them feel even safer.