Search code examples
berkeley-db

Is it possible to get average Berkeley DB record size


I'm using db_stat to get approximate number of records in the BDB (to avoid iteration over the whole database):

[me@home magic]$ db_stat -d random.db
Thu Mar  3 13:38:25 2016        Local time
61561   Hash magic number
8       Hash version number
Little-endian   Byte order
        Flags
643     Number of pages in the database
4096    Underlying database page size
0       Specified fill factor
2340    Number of keys in the database
2340    Number of data items in the database
299     Number of hash buckets
303540  Number of bytes free on bucket pages (75% ff)
15      Number of overflow pages
39282   Number of bytes free in overflow pages (36% ff)
114     Number of bucket overflow pages
322730  Number of bytes free in bucket overflow pages (30% ff)
0       Number of duplicate pages
0       Number of bytes free in duplicate pages (0% ff)
1       Number of pages on the free list

Is it possible to get average record size as well?

I guess I can use following info to get overall size:

643     Number of pages in the database
4096    Underlying database page size

643*4096 = 2633728 Bytes (corresponds with the file size) and get approximate record size 2633728/2340 = 1125

So my question - would using additional info from db_stat info give me more accurate result?


Solution

  • You've computed the upper bound on average record size:

    643 pages * 4096 bytes / page = 2633728 bytes total
    2633728 bytes / 2340 keys (records) = 1126 bytes / record
    

    You can get closer to the truth by subtracting all the "bytes free on XXX pages" from the total. This is space that's not in use by the database because of inefficiencies in how it was populated. (As an aside, this doesn't look too bad, but whenever there are a significant number of overflow pages, you could consider a larger page size. Of course, there are downsides to larger page sizes too. Yay, databases!)

     2633728 bytes 
    - 303540 bytes free on bucket pages
    -  39282 bytes free in overflow pages
    - 322730 bytes free in bucket overflow pages
    -      0 bytes free in duplicate pages
    --------
     1968176 bytes total / 2340 keys = 841 bytes / record
    

    This figure still isn't really the average record size, but I think it's as close as you can get from db_stat. It includes the supporting database structure for each record, and other database overhead.