Search code examples
postgresqlnpgsql

Directly return Postgres jsonb binary bytes


I've a table with a jsonb column. when I run the below query it returns 5559 by

select pg_column_size("jsonb_column") from "Table"

while when I return the stream of bytes in my application I am getting 18,185 bytes.

I wonder if there is a functionality in postgres to return the binary/decomposed format instead of encoded text?


Solution

  • There is, but it won't help.

    Chapter 55. Frontend/Backend Protocol:

    The desired format for any value is specified by a format code. Clients can specify a format code for each transmitted parameter value and for each column of a query result. Text has format code zero, binary has format code one, and all other format codes are reserved for future definition.

    You can find an example use in libpq's PQexecParam:

    resultFormat
    Specify zero to obtain results in text format, or one to obtain results in binary format.

    So clients just switch a bit when they want binary. npgsql hides that in src/Npgsql/Internal /DataFormat.cs:

    public enum DataFormat : byte
    {
        Binary,
        Text
    }
    
    static class DataFormatUtils
    {
        public static DataFormat Create(short formatCode)
            => formatCode switch
            {
                0 => DataFormat.Text,
                1 => DataFormat.Binary,
                _ => throw new ArgumentOutOfRangeException(nameof(formatCode), formatCode, "Unknown postgres format code, please file a bug,")
            };
    

    Thing is, the binary format you get from that is not necessarily the lightweight internal representation, it's whatever the type's _send() function produces. For simple things like date, it's pretty much handing out the internal representation. For more complex things, like daterange, range_send() is constructing a specific "external" format, but mostly out of internal binaries.
    In case of jsonb_send(), it just generates its textual representation with a version number up front and sends that:

    /*
     * jsonb type send function
     *
     * Just send jsonb as a version number, then a string of text
     */
    Datum
    jsonb_send(PG_FUNCTION_ARGS)
    {
        Jsonb      *jb = PG_GETARG_JSONB_P(0);
        StringInfoData buf;
        StringInfo  jtext = makeStringInfo();
        int         version = 1;
    
        (void) JsonbToCString(jtext, &jb->root, VARSIZE(jb));
    
        pq_begintypsend(&buf);
        pq_sendint8(&buf, version);
        pq_sendtext(&buf, jtext->data, jtext->len);
        destroyStringInfo(jtext);
    
        PG_RETURN_BYTEA_P(pq_endtypsend(&buf));
    }
    

    That means the jsonb value gets inflated to a json size.

    On top of that, the comparison you were looking at isn't apples-to-apples because pg_column_size() tells you the effective size, after TOASTing and transparent compression:

    pg_column_size ( "any" ) → integer
    Shows the number of bytes used to store any individual data value. If applied directly to a table column value, this reflects any compression that was done.

    Some values might be saved raw, some compressed inline, some raw but moved out of line to a TOAST table, some both moved to TOAST and then also compressed in there. TOASTing costs some additional space, compression reduces it and there's more than one algorithm that can be used.

    If your goal is to get the same value but in size closer to 5559 bytes than 18185, Postgres has pgcrypto where you can set compress-algo to 1 for zip and compress-level to an unreasonably high 9, and run your values through that. The text representation of jsonb should compress nicely, an array of similar jsonb objects even more so. Still, that does get you the same thing, in less bytes but you'll have to undo all that to actually do anything with the them in your app.