Search code examples
mysqlrecorddiskspacedisk-access

MySQL record disk-usage calculation | Am I right?


I am trying to estimate the real disk-usage required space for each record of my table in MySQL RDBMS.

The table has a structure like this:

  • ID INT 4 byte;
  • VARCHAR(34) 34 byte;
  • INT 4 byte;
  • INT(5) 4 byte;
  • INT 4 byte;
  • INT 4 byte which is also a FOREIGN KEY;

So there are 5 INT fields and a VARCHAR of a maximum of 34 chars (i.e. 34 bytes).

I have 2 questions:

1) The total should be 54 bytes per record (with variable VARCHAR, of course) am I right when I am saying that, or there are also some over-head bytes which I should consider when estimating the disk-usage space?

2) I have also used INT(5) instead of CHAR(5) cause I need to store only exactly 5 digits in that field (I am going to do that by application, with regExp and string length, cause I know that INT(5) could be more than an int with 5 digits). But could this be considered such as an optimization by the disk-usage space cause I am using an INT (4 bytes) instead of a CHAR(5) which is 5 bytes, i.e. 1 more byte per record?

Thanks for the attention!


Solution

  • One record itself will use

    1 byte in offsets

    0 bytes in NULLable bits

    5 bytes in "extra bytes" header

    4 bytes ID

    6 bytes transaction id

    7 bytes rollback pointer

    0-3*34 bytes in VARCHAR(34) (one character may take up to 3 bytes because of UTF8)

    4*4 bytes in other integers

    Each distinct value of FK will lead to one record in a secondary index. it will use

    5 bytes in "extra bytes" header

    4 bytes INT for FK value

    4 bytes INT for Primary key

    Other overhead is page level: 120 bytes per page (16k) in headers page fill factor 15/16 - i.e. one page may contain 15k in records.

    And the last - add space used by non-leaf pages, which should be small anyway

    So, answer to question - 1) yes there will be some overhead that you can calculate using information above.

    2) CHAR(5) in UTF8 will add a byte for its length, so INT looks reasoaanle to use