Search code examples
amazon-web-servicessnapshotamazon-ebs

AWS EBS snapshots questions


I'm writing because I'm very confused around mechanism that is responsible for taking EBS snapshots.

  1. First of all as far as I understand the difference between "backup" and "snapshot" - backup is full copy of volume blocks one to one, where snapshot is "delta" approach where only changed blocks are being copied right?

  2. If that definition is right, than I can assume that taking EBS snapshot should be called backup - as we do typically full copy of all blocks that particular EBS is build on.

  3. In almost every documentation from AWS website, I can read that EBS snapshots are taken incrementally (first one is full, then only difference between previous "state"). But after my small exercise on AWS console I was not able to see that in action.

I did snapshot of my EBS volume (50GB) and snapshot had a size exactly 50GB. Than I did another snapshot - again size 50GB. It made me incredible confused :///

  1. All my experience / test were made only using root volume (first attached to EC2 instance). Now I was wondering if I have DB installed (postgreSQL) on EC2 that has only root volume attached, is that safe to make a snapshot of EBS (as a safe backup for my DB) as machine is running? Or unfortunately I should periodically take whole instance offline and only than make a backup of my DB volume?

Solution

  • EBS Snapshots work like this:

    On your initial snapshot, it will create a block-level copy of your volume on S3 in the background. On subsequent snapshots it only saves the blocks that have changed since the last snapshot to S3 and for the rest it will keep track of a pointer to the original blocks. The third snapshot will work similar to the second snapshot, it again stores the blocks that have changed since the second snapshot and adds pointers to the other blocks.

    If you restore the second snapshot, it will create a new volume and take a look at its metadata store, which pointers belong to that snapshot and then retrieve the blocks from S3 these point to.

    If you delete Snapshot two, it will remove the pointers to the blocks that belong to snapshot two. If any of the blocks on S3 has no pointer left, i.e. doesn't belong to a snapshot anymore, it will be deleted.

    To you as the client this whole process is transparent - you can delete or restore any snapshot you like and EBS will take care of the specifics in the background.

    Should you be more interested in the details under the hood, I can recommend this article: The Jellyfish-inspired database under AWS Block Storage