Search code examples
mysqlinnodbdatabase-performance

How are JSON and TEXT types in MySQL(InnoDB) stored and how it affects read performance


I would like to know in depth around how exactly JSON and TEXT are stored in MySQL InnoDB. I have heard and read in some places that for these types only a pointer is stored in the row with the actual data residing elsewhere on disk.

  • Would the length of these types(especially in JSON) affect the decision of whether they are being stored at a different location with a pointer in the row(in the case where JSON size is large) or at the same location as the row without any pointer (in the case where JSON size is small)?
  • If a pointer is always stored, then wouldn't keeping the datatype as JSON for any JSON doc someone would have to store(as it would seem the right type choice to store a JSON due to the datatype name, even though the size of JSON is small) would affect badly read performance as now every row has to fetch its JSON document from a separate location for each row.
  • Provided
    • I always insert a valid JSON in MySQL
    • I am only reading the full document and not querying the JSON
    • My size of JSON is predictable and small < 1000 chars let's say

Would VARCHAR be a better choice as compared to JSON datatype from a read performance perspective as now data would reside along with row and not in separate locations hence saving the additional disk pages fetch?

  • Is there any metric around how much would read performance get affected by having a JSON vs VARCHAR?
  • In case a JSON column is to be used, would it be a better choice to keep all JSON in a single column rather than creating multiple JSON columns in order to avoid multiple pointer-based fetches?
  • Is the effect on read performance due to additional disk read due to pointer-based fetch of JSON negligible that I should not think about this kind of optimization (considering I am dealing with designs of tables that could have millions to billions of rows)?

Solution

  • In MySQL 5.7 and 8.0, JSON is a subtype of BLOB. Cf. https://github.com/mysql/mysql-server/blob/8.0/sql/field.h#L143-L150

    (I had to ask the version because some people tag their question mysql when they are really using MariaDB. MariaDB has a different implementation for JSON than MySQL, and I have no idea how MariaDB stores it.)

    You should use the JSON type if you store valid JSON (if you store fragments of JSON, you can't use the JSON type).

    BLOB, TEXT, VARCHAR, VARBINARY, and thus also JSON are stored the same in InnoDB. There will be no advantage to choosing VARCHAR over JSON.

    The default InnoDB row format in both MySQL 5.7 and 8.0 is DYNAMIC, so I'll comment assuming you use the default row format. See https://dev.mysql.com/doc/refman/5.7/en/innodb-row-format.html#innodb-row-format-dynamic for a description.

    If a given string (BLOB/TEXT/VARCHAR/VARBINARY/JSON) is short enough to fit within 40 bytes, this row format stores it inline on the same page with other columns. Otherwise it stores the string on one or more additional pages, and on the primary page, it stores a 20-byte pointer to the first additional page. Thus it might multiple page reads to fetch one of these data types, but it depends on the length.

    Each page is of fixed size (16KB by default). If the string is too long to fit on one additional page, it is split over multiple pages, and the pages aren't necessarily consecutive. Each page points to the next page for the continuation of the string.

    I don't have an answer about the metrics you were interested in. Storage doesn't have performance; that's relative to specific queries you will run. As described above, it also depends on the length of your JSON documents. It could also depend on your server power, amount of RAM available for caching pages, and other concurrent queries competing for RAM. The only way you'll get an accurate answer for that question is to run some tests yourself, using your own data on your own server.

    I also can't answer about whether it's better for your case to separate the data into multiple JSON documents versus combine them all into one. One reason is that your query patterns may naturally need to use a small subset of the JSON data more frequently, so it would be an advantage to store that separately. But if you always need to read the whole JSON content, then there's no reason not to combine them into one. No one on Stack Overflow can answer this for you, because it depends on your query needs.