Search code examples
javacoperating-systemstorageparquet

Worth it to access data by blocks on modern OS/hardware?


As 'worth' in the title could be opinion-based, I would refine this question as follows, before the original content presented.

The question is about designing a file format, or a data access method utilizing file interfaces. During these procedures, does it matter in terms of performance to 'partition' data into blocks, which fit the underlying storage device physically?

In other words, as OS/hardware employs various mechanisms for efficiency, such as buffering, pre-fetching, etc., do these underlying mechanisms dissolve efforts or defects at the application level?


Original Post:

My question is about accessing data from a dedicated file by offsets, without any database or serialization frameworks. While textbooks say it is recommended to access data by blocks, such as retrieving/flushing 4KiB (or larger depending on the specific device) at once, I am confused sometimes about its necessity.

I have seen many database projects manage data with a block-aware design, such as reading/writing data from disk once a batch rather than bytes that are actually requested.

Memory/disks and systematical buffers are controlled by OS/hardware nowadays, and programmers cannot directly control block move in/out despite FileDescriptor.sync().

Thereby, even though programs access data by blocks, which increases complexity, the effort may be eliminated by the OS mechanism. On the contrary, if programs just access data with its offset, regardless of blocks, the underlying OS may help to schedule block retrieval as usual. Let alone that some programming language employs complicated memory models that may affect disk access, like Java.

I know that it could be helpful for ACID with block/buffer management, but I wonder whether this design is aimed at efficiency as well.

I've searched the website however gained no clues. Please note some keywords and I am sorry for my ignorance.


Although the answers below have already resolved my confusion, I feel it is my responsibility to state the question more clearly, making the post more relevant to the community.

Part of my confusion originates from the existing open file format, say Parquet. I see it employs configurable fixed-length pages, however, following a header of fragmentary size. I was wondering if the file was stored from the beginning of a block, would the header make the following page splii a little over the block? Given that the page takes the size of one block.

For another example, assuming that I was building a database system, using Java. The database uses a specific file format consisting of fixed-length blocks at 16 KiB. My task is to implement an update to a record which involves about 100 bytes. With the record position already known, I can write directly with FileChannel.write(assuming that no other records are affected), or retrieve the entire block (for later use) and rewrite the whole 16 KiB. In this particular pattern, is it more efficient to write those 100 bytes than the whole block?

Hope this supplement makes the post more reasonable.


Solution

  • Worth it to access data by blocks on modern OS/hardware?

    Maybe. As you observe, there is usually buffering at the hardware level, in the OS kernel, and below applications in userspace, all aimed at improving I/O performance. Whether a particular application can benefit from performing application-level I/O operations in a block-aware manner depends on the data access patterns of the application.

    The point of application-level block awareness is usually to reduce the number of (block-oriented) I/O operations that must be performed at lower levels, especially the hardware level. For example, if I design my database carelessly, I might end up with major structures that frequently span more disk blocks than they need to do for their size. All buffering notwithstanding, it is faster to retrieve fewer blocks than to retrieve more. It can also be faster to write application-layer data in units of whole disk blocks, because then the system can ignore the initial contents of the destination blocks. Otherwise, it must first read the contents of some of those blocks, update the parts that must be overwitten, and only then write the blocks.

    I wonder whether this design is aimed for efficiency as well?

    You are right that it is more work and more complex to design and implement application-level accommodation of hardware block sizes. Such designs generally are not undertaken other than for I/O efficiency reasons. Whether they are successful in that regard is a different question, but that such designs are still comparatively common for applications designed to provide high I/O throughput should incline you to think that yes, such designs can be effective in that regard.