Search code examples
amazon-web-servicesamazon-s3amazon-ebs

Block Level vs File Level Storage


I have some to those terms: Block Level Storage and File Level Storage. Can someone explain why one is better than the other?

Perhaps with examples and algorithmic thinning it would be really interesting to understand.

For example, articles in AWS say that AWS EBS can be use for databases, but why is it better than File Level?


Solution

  • I like to think of it like this:

    • Amazon Elastic Block Store (Amazon EBS) is block storage. It is just like a USB disk that you plug into your computer. Information is stored in specific blocks on the disk and it is the job of the operating system to keep track of which blocks are used by each file. That's why disk formats vary between Windows and Linux.
    • Amazon Elastic File System (Amazon EFS) is a filesystem that is network-attached storage. It's just like the H: drive (or whatever) that companies provide their employees to store data on a fileserver. You mount the filesystem on your computer like a drive, but your computer sends files to the fileserver rather than managing the block allocation itself.
    • Amazon Simple Storage Service (Amazon S3) is object storage. You give it a file and it stores it as an object. You ask for the object and it gives it back. Amazon S3 is accessed via an API. It is not mounted as a disk. (There are some utilities that can mount S3 as a disk, but they actually just send API calls to the back-end and make it behave like a disk.)

    When it comes to modifying files, they behave differently:

    • Files on block storage (like a USB disk) can be modified by the operating system. For example, changing one byte or adding data to the end of the file.
    • Files on a filesystem (like the H: drive) can be modified by making a request to the fileserver, much like block storage.
    • Files in object storage (like S3) are immutable and cannot be modified. You can upload another file with the same name, which will replace the original file, but you cannot modify a file. (Uploaded files are called objects.)

    Amazon S3 has other unique attributes, such as making object available via the Internet, offering multiple storage classes for low-cost backups and triggering events when objects are created/deleted. It's a building-block for applications as opposed to a simple disk for storing data. Plus, there is no limit to the amount of data you can store.

    Databases

    Databases like to store their data in their own format that makes the data fast to access. Traditional databases are built to run on normal servers and they want fast access, so they store their data on directly-attached disks, which are block storage. Amazon RDS uses Amazon EBS for block storage.

    A network-attached filesystem would slow the speed of disk access for a database, thereby reducing performance. However, sometimes this trade-off is worthwhile because it is easier to manage network-attached storage (SANs) than to keep adding disks to each individual server.

    Some modern 'databases' (if you can use that term) like Presto can access data directly in Amazon S3 without loading the data into the database. Thus, the database processing layer is separated from the data layer. This makes it easier to access historical archived data since it doesn't need to be imported into the database.