Search code examples
jsonpostgresql

Is it true that JSONB may take more disk space than JSON in PostgreSQL? and Why?


A lot of postgreSQL JSON related articles mentioned that JSONB might take slightly bigger disk space than JSON. I didn't find it in PostgreSQL docs. if its true, can someone help me understand why?

in this article: "It may take more disk space than plain JSON due to a larger table footprint, though not always.". I am not sure what "large table footprint" means.


Solution

  • From the manual:

    The json data type stores an exact copy of the input text, which processing functions must reparse on each execution; while jsonb data is stored in a decomposed binary format that makes it slightly slower to input due to added conversion overhead, but significantly faster to process, since no reparsing is needed

    Essentially postgres is parsing the json at load and storing it in such a way that makes it easier to query and indexable. This transformation likely adds some storage size since it's not just storing the json string that was passed as-is.

    I would imagine that under the hood the json data is being parsed into something more relational for storage and indexing (although the manual doesn't call it out). There is likely overhead for creating underlying (invisible to you) table schema to store this. That storage overhead (table metadata, and storage methodology of postgres tables) is likely the biggest factor in the increase in size.


    Also on that page it reads:

    Because the json type stores an exact copy of the input text, it will preserve semantically-insignificant white space between tokens, as well as the order of keys within JSON objects. Also, if a JSON object within the value contains the same key more than once, all the key/value pairs are kept. (The processing functions consider the last value as the operative one.) By contrast, jsonb does not preserve white space, does not preserve the order of object keys, and does not keep duplicate object keys. If duplicate keys are specified in the input, only the last value is kept.

    Which would suggest that if your data has lots of duplicate keys or significant superfluous whitespace, that it will likely offset the change in storage size when postgres converts from json string to its jsonb format.


    I believe the last pertinent section that may suggest differences in storage is:

    When converting textual JSON input into jsonb, the primitive types described by RFC 7159 are effectively mapped onto native PostgreSQL types, as shown in Table 8.23. Therefore, there are some minor additional constraints on what constitutes valid jsonb data that do not apply to the json type, nor to JSON in the abstract, corresponding to limits on what can be represented by the underlying data type.

    That hints that Postgres is parsing the JSON on load and converting to its own types. For strings in your json, that means a converstion to postgres' text type is a variable unlimited length string. By definition that will take on an extra byte or more to store the length of the string. As an example a string like "hi" which may generally only need two bytes to store, would end up using 3 bytes. 2 bytes for the string, and another extra byte to store that its 2 characters.

    In the end, you are trading some upfront compute and some storage in order to save compute when reading the data. If the requirement of your application is to quickly read the data being stored in this json, then use jsonb. If the requirement, on the other hand, is backup, or to store and query it every now-and-again (but not often) then store it as json.