Search code examples
cblockstat

STAT Block size/blocks used


I have a question which is confusing me and my task is to work out fragmentation.

stat() for a file:
st_size = 10520
st_blksize = 4096
st_blocks = 24

I have read in some places that st_blksize is the general block size of the file system which in this case is 4096 but that file would fit into 3 blocks, 10520 / 512 is 20.5 meaning that there are 3.5 blocks of unused space, even though it is allocated. Does this mean that there are 1792 unused bytes in this file (fragmentation)?

As I have mentioned I read into this a fair bit and have read a lot of contradicting texts, would like someone to clear this up once and for all!


Solution

  • I don't think your project is really solvable at the stat(2) API layer. Consider the case of a file 4096 bytes long. Presume it was created by iteratively appending 512 byte blocks over and over again. Presume that the filesystem was completely full, except for one 512 byte block, for each and every write. Presume that the 512 byte block available for each write was located in a randomly available spot on the disk.

    This file is 100% fragmented -- no two blocks are near each other.

    And yet, a measure based solely on the stat(2) variables might well show that there are no wasted blocks anywhere in the file.

    When trying to track down an answer to your actual question, I got as far as ext3_write_begin() before being called away -- hope this is a useful starting point for your search.

    Update

    If you're interested in finding fragmentation, I think the place to start is the bmap command from the debugfs(8) program:

    debugfs:  bmap sars_first_radio_show.zip 0
    94441752
    debugfs:  bmap sars_first_radio_show.zip 1
    94441781
    debugfs:  bmap sars_first_radio_show.zip 2
    94441782
    debugfs:  bmap sars_first_radio_show.zip 3
    94441783
    debugfs:  bmap sars_first_radio_show.zip 4
    94441784
    debugfs:  bmap sars_first_radio_show.zip 5
    94459905
    debugfs:  bmap sars_first_radio_show.zip 6
    95126019
    debugfs:  bmap sars_first_radio_show.zip 7
    95126020
    debugfs:  bmap sars_first_radio_show.zip 8
    95126021
    debugfs:  bmap sars_first_radio_show.zip 9
    95126022
    debugfs:  
    

    This shows the first ten blocks for the file sars_first_radio_show.zip; you can see that the blocks aren't all contiguous: 944417{52,81,82,83,84}, 94459905, 951260{19,20,21,22}.

    You could either script an answer around debugfs(8) or you could use the libext2fs library routines yourself. It would be a significant step up in complexity compared to the stat(2) exercises you were going through -- but the answers would mean something, rather than just be a vague guess.