Search code examples
databasepostgresqlstoragenumeric

Byte size of numeric(precision, scale) in Postgres


Given that the size is variable, I am trying to calculate the average cost of a numeric(p, s) attribute. In my case, numeric(10, 2). From what I understand of the Postgres documentation, this means I have a total of 10 digits where 2 of them are exclusive of the decimal part, hence, the maximum value I can store is 99999999,99.

What I do not understand is this part of the documentation:

The actual storage requirement is two bytes for each group of four decimal digits, plus three to eight bytes overhead.

My two questions here are:

  • If I have two decimal digits, that means the storage requirement for them would be one byte?
  • If I have three to eight bytes overhead, how can I know which value corresponds to my case?

I ran a query with an average of pg_column_size that gave me a total of 6.95 bytes. However, I want to know if there is any way I can calculate the approximate value before checking the actual one. (Same with other variable types in Postgres)


Solution

  • If I have two decimal digits, that means the storage requirement for them would be one byte?

    No, a numeric with two digits occupies 5 bytes on disk and 8 bytes in RAM.

    If I have three to eight bytes overhead, how can I know which value corresponds to my case?

    As the manual says:

    two bytes for each group of four decimal digits

    For each started group of four decimal digits, to be precise. So we can compute:

    ceil(digits / 4.0) * 2
    

    plus three to eight bytes overhead.

    The minimum overhead of 3 bytes only applies to '0' in "packed" format on disk. Else:

    • 2 bytes overhead for any digits at all (except special case 0)
      Or + 4 bytes overhead for more than 256 digits

    • 3 bytes overhead in "packed" format, which is in use for storage on disk
      Or + 6 bytes overhead when storage exceeds 128 bytes or for storage in RAM

    Sign, decimal point, and white space of the number are insignificant for storage size.
    Type modifiers (precision and scale) of the numeric type are also insignificant.

    Here is a demo with formulas for the complete computation:

    SELECT id, digits
         , pg_column_size(n) AS disk_size
         , CASE
              WHEN n = 0         THEN 3                           -- efficient storage of 0
              WHEN digits <= 248 THEN 3 + ceil(digits / 4.0) * 2  -- "packed" format for size <= 128 bytes 
              WHEN digits <= 256 THEN 6 + ceil(digits / 4.0) * 2  -- effecitve size > 128 bytes
              ELSE                    8 + ceil(digits / 4.0) * 2  -- more than 256 digits
           END AS expected_disk_size
         , pg_column_size(n + 0) AS ram_size
            , CASE
              WHEN n = 0         THEN 6                           -- efficient storage of 0
              WHEN digits <= 256 THEN 6 + ceil(digits / 4.0) * 2  -- effecitve size > 128 bytes
              ELSE                    8 + ceil(digits / 4.0) * 2  -- more than 256 digits
           END AS expected_ram_size
    FROM   tbl
         , length(translate(n::text, '-. ', '')) AS digits;       -- only digits matter
    

    fiddle

    Note that the no-op n + 0 forces the stored value to be unpacked, so we get RAM size. Related:

    Also note that storage of null values is different. See: