file-io performance-testing disk-io sysbench

What is measured by changing the `file-block-size` parameter in sysbench fileio test?

I was trying to measure the system performance with sysbench fileio test. However, I'm not sure what am I playing with when I change that file-block-size parameter.

Previously I thought it was the file system block size, but then I looked at the code and it is actually a wrapper outside the file system block size. The pseudo code of sysbench reading a file in the fileio test is as follows (mainly comes from the sb_fileio.c):

while current_pointer < file_leng:
    read_leng = min(file_block_size, file_leng - current_pointer)
    pread(fd, read_buf, read_leng, current_pointer)
    current_pointer += read_leng

Sysbench is using pread, a syscall implemented by the file system here. When the file size is smaller than file_block_size, that parameter makes no sense as the read size will always be smaller than the file_block_size we gave it, and the actual block size used in pread (i.e. how many bytes we have to load from disk to memory even we just want to read 1 byte) is already defined by the file system (if not hardware).

For example, supposing the file system block size used by pread is 4K. When sysbench file_block_size is 1K/2K/4K, each pread syscall will get us a 4K/4K/4K block; when sysbench set file_block_size = 1024K and file_size = 1024K, each pread syscall will get us 256*4K blocks (instead of 1*1024K block); but when file_block_size = 1024K and file_size = 16K, the read length sent to pread will always be just 16K, and instead of retrieving 1024K (= 256 * 4K), it will retrieve 4*4K blocks as it is using the min(file_size, file_block_size) and that's it.

Is my understanding right? If so, what am I actually playing with by changing that parameter? Or am I supposed to always set the file_size to be bigger that that file_block_size?

Also, when loading 1024K, sysbench is actually loading 256 * 4K block inside the pread syscall, but not that 1024K as a whole - should there be any performance (throughput/latency) difference between these two behaviors?

=====

The command I used:

./sysbench --file-block-size=<file_block_size> --file-total-size=65536K --file_num=<file_num> --file-test-mode=rndrd --file-fsync-all=on --file-extra-flags=direct fileio <prepare/run/cleanup>

The file_block_size is in {1K, 4K, 16K, 256K, 1024K, etc.}, the file_num is in {1, 4, 16, ..., 65536} ==> single file size is in {65536K, 16384K, ..., 1K}. The result I get:

Latency (us) over file size (K) with different file_block_sizes

Here 16K files with 256K file_block_size is having much lower latency than 256K files with 256K file_block_size. That should not be the case if the file_block_size is the load unit size of hardware, so it is not the file system block size (I have an ext2/ext3 file system with 4K block size). Then what it is?

Solution

A few different observations.

First, it appears what the fileblock size actually is, is a buffer size before issuing an IO operation. In other words, its the size of the buffer being used for a read (or write) to be filled.

So if you have a 4k file system block size, and a 256k file block size, and 1024k file size, what is happening is you are issuing 4 separate 256k reads. Each of those reads "under the hood" reads a range of 4k file system blocks (often called an extent). Reading in larger extents is usually more efficient than reading in small amounts as the lower level schedulers can optimize things, you also are generating fewer requests and subsequent system calls, less to queue up etc.

Second observation, it looks like there is some guardrails that are silently doing things internally when you have a block size larger than a file size. Presumably what is happening is that the utility is simply reading the entire file in one go. This would explain why you a 16k file with a 256k block size is faster than a 256k file, simply because you're reading 32 times less data.

And to wrap up and hopefully answer your original question about the file size vs block parameter, once the file size is <= the block size, you're only going to see a single read request. This isn't inherently wrong, but one should be aware of it depending on what they're trying to measure.