Search code examples
mysqlinnodbmariadb

Difference in datafile size on a same data?


I use MariaDB 10.1.16

I do very simple job this time.

Select data from oracle and Make csv file Load that in MariaDB using load infile data command. DB Engine is InnoDB.

Data row count is 6497641. Both table is created same query. PK is auto_increment and int type;

Row created by...

  1. TABLE1 - load infile data...
  2. TABLE2 - insert into TABLE2 select * from TABLE1...

size of tables is below.

  1. TABLE1 - 3.3GBytes
  2. TABLE2 - 1.9GBytes

Contents in mysql.innodb_table_stats is below.

  1. TABLE1: n_rows(5438171) , clustered_index_size(196096), sum_of_other_index_sizes(12853)
  2. TABLE2: n_rows(6407131) , clustered_index_size(106048), sum_of_other_index_sizes(12273)

I wanna know why the size of files is different.

thank you.


Solution

  • The order of the rows can make a big difference. If the data is sorted by the PRIMARY KEY as it is inserted, the blocks will be packed nearly full. If the rows are randomly sorted, the end result will be blocks that are about 69% full. This is the nature of inserting into a BTree.

    n_rows is just an approximation, hence the inconsistent count. The other pair of values, I think, an exact number of 16KB blocks.

    Since the PK is "clustered" with the data, the clustered_index_size is the size of the data, plus some overhead for the BTree on the PK. Plus a lot of overhead and/or wasted space (as mentioned above).