Search code examples
mysqljsonmongodbsizediskspace

mysql json vs mongo - storage space


I am experiencing an interesting situation and although is not an actual problem, I can't understand why this is happening.

We had a mongo database, consisting mainly of some bulk data stored into an array. Due to the fact that over 90% of the team was familiar with mysql while only a few of us were familiar with mongo, combined with the fact that is not a critical db and all queries are done over 2 of the fields (client or product) we decided to move the data in mysql, in a table like this

[idProduct (bigint unsigned), idClient (bigint unsigned), data (json)]

Where data is a huge json containing hundreds of attributes and their values.

We also partitioned in 100 partitions by a hash over idClient.

PARTITION BY HASH(idClient)
PARTITIONS 100;

All is working fine but I noticed an interesting fact:

The original mongo db had about 70 GB, give or take. The mysql version (containing actually less data because re removed some duplicates that we were using as indexes in mongo) has over 400 GB.

Why does it take so much more space? In theory bson should actually be slightly larger than json (at least in most cases). Even if indexes are larger in mysql... the difference is huge (over 5x).


Solution

  • I did a presentation How to Use JSON in MySQL Wrong (video), in which I imported Stack Overflow data dump into JSON columns in MySQL. I found the data I tested with took 2x to 3x times more space than importing the same data into normal tables and columns using conventional data types for each column.

    JSON uses more space for the same data, for example because it stores integers and dates as strings, and also because it stores key names on every row, instead of just once in the table header.

    That's comparing JSON in MySQL vs. normal columns in MySQL. I'm not sure how MongoDB stores data and why it's so much smaller. I have read that MongoDB's WiredTiger engine supports options for compression, and snappy compression is enabled by default since MongoDB 3.0. Maybe you should enable compressed format in MySQL and see if that gives you better storage efficiency.

    JSON in MySQL is stored like TEXT/BLOB data, in that it gets mapped into a set of 16KB pages. Pages are allocated one at a time for the first 32 pages (that is, up to 512KB). If the content is longer than that, further allocation is done in increments of 64 pages (1MB). So it's possible if a single TEXT/BLOB/JSON content is say, 513KB, it would allocate 1.5MB.