Search code examples
postgresqlpostgresql-9.6

Finding what row or column is using TOAST in postgres 9 (and reducing TOAST uasge)?


I've inherited the maintenance of a system with a ridiculously large TOAST table. For a table and index size of about 400 MB, the toast is 631 GB.

clientportal=# SELECT oid, table_schema, table_name,                                                                                        
    total_bytes,
    pg_size_pretty(total_bytes) AS total
    , pg_size_pretty(index_bytes) AS INDEX
    , pg_size_pretty(toast_bytes) AS toast
    , pg_size_pretty(table_bytes) AS TABLE
  FROM (
  SELECT *, total_bytes-index_bytes-COALESCE(toast_bytes,0) AS table_bytes FROM (
      SELECT c.oid,nspname AS table_schema, relname AS TABLE_NAME
              , c.reltuples AS row_estimate
              , pg_total_relation_size(c.oid) AS total_bytes
              , pg_indexes_size(c.oid) AS index_bytes
              , pg_total_relation_size(reltoastrelid) AS toast_bytes
          FROM pg_class c
          LEFT JOIN pg_namespace n ON n.oid = c.relnamespace
          WHERE relkind = 'r'
  ) a
) a order by total_bytes desc;


   oid   |    table_schema    |                table_name                | total_bytes  |   total    |   index    |   toast    |   table    
---------+--------------------+------------------------------------------+--------------+------------+------------+------------+------------
   17202 | public             | phones                                   | 678297559040 | 632 GB     | 119 MB     | 631 GB     | 280 MB

Using a query that does pg_size_pretty(sum(pg_column_size($columnName))) on all the tables, I don't see which column could be using that much. So I probably hit a limitation or misunderstanding of that function:

-[ RECORD 1 ]---------+-----------
access_info           | 153 kB
activation_date       | 1291 kB
autocreate            | 835 kB
btn_id                | 3409 kB
callerid              | 5954 kB
calling_firstname     | 1 bytes
calling_lastname      | 1 bytes
connectiontype_id     | 1 bytes
created_at            | 6985 kB
did_supplier_id       | 17 kB
e911_notes            | 274 kB
e911_tested           | 70 bytes
e911_tested_by_id     | 
e911_tested_date      | 
email                 | 2431 kB
extension             | 2020 kB
firstname             | 152 kB
foreign_pid           | 16 kB
foreign_userid        | 10 kB
human_validation_date | 176 bytes
huntgroup             | 873 kB
id                    | 3492 kB
incoming_toll         | 873 kB
ipaddress             | 1 bytes
lastname              | 4406 bytes
line_of_business      | 305 kB
local_rate_plan_id    | 6264 bytes
national_rate_plan_id | 
notes                 | 114 MB
phone_mac_addr        | 5041 kB
phone_number          | 9656 kB
phone_type_id         | 
rate_center_id        | 4 bytes
service_type          | 3786 bytes
sip_password          | 22 MB
sip_username          | 8919 kB
site_id               | 3480 kB
termination_date      | 2217 kB
updated_at            | 6985 kB
virtualdid            | 873 kB
voicemail_number      | 1268 kB
voicemailportal       | 873 kB
world_rate_plan_id    | 

I suspect that the sip_password or notes column is sending the row over to toast. That doesn't prove I was hoping that setting those columns to null would reduce TOAST usage, but after researching, I'm assuming that won't happen or will require a VACCUM FULL

  1. Is there a way of determining what column or row is using that much TOAST space ?
  2. After setting some text columns to null, what would I have to run to reduce TOAST size?

Update: In case \d+ matters (but it doesn't quite answer the question):

clientportal=# \d+ phones
                                                               Table "public.phones"
        Column         |            Type             |                      Modifiers                      | Storage  | Stats target | Description 
-----------------------+-----------------------------+-----------------------------------------------------+----------+--------------+-------------
 id                    | integer                     | not null default nextval('phones_id_seq'::regclass) | plain    |              | 
 created_at            | timestamp without time zone |                                                     | plain    |              | 
 updated_at            | timestamp without time zone |                                                     | plain    |              | 
 site_id               | integer                     |                                                     | plain    |              | 
 btn_id                | integer                     |                                                     | plain    |              | 
 phone_number          | character varying(255)      |                                                     | extended |              | 
 extension             | character varying(255)      |                                                     | extended |              | 
 voicemail_number      | character varying(255)      |                                                     | extended |              | 
 access_info           | character varying(255)      |                                                     | extended |              | 
 notes                 | text                        |                                                     | extended |              | 
 activation_date       | date                        |                                                     | plain    |              | 
 termination_date      | date                        |                                                     | plain    |              | 
 phone_mac_addr        | character varying(255)      |                                                     | extended |              | 
 phone_type_id         | integer                     |                                                     | plain    |              | 
 sip_username          | character varying(255)      |                                                     | extended |              | 
 e911_notes            | character varying(255)      |                                                     | extended |              | 
 e911_tested           | boolean                     |                                                     | plain    |              | 
 e911_tested_by_id     | integer                     |                                                     | plain    |              | 
 e911_tested_date      | date                        |                                                     | plain    |              | 
 did_supplier_id       | integer                     |                                                     | plain    |              | 
 email                 | character varying(255)      |                                                     | extended |              | 
 ipaddress             | character varying(255)      |                                                     | extended |              | 
 connectiontype_id     | character varying(255)      |                                                     | extended |              | 
 sip_password          | text                        |                                                     | extended |              | 
 virtualdid            | boolean                     | default false                                       | plain    |              | 
 voicemailportal       | boolean                     | default false                                       | plain    |              | 
 callerid              | character varying(255)      | default NULL::character varying                     | extended |              | 
 autocreate            | boolean                     | default false                                       | plain    |              | 
 huntgroup             | boolean                     | default false                                       | plain    |              | 
 firstname             | text                        |                                                     | extended |              | 
 lastname              | text                        |                                                     | extended |              | 
 calling_firstname     | text                        |                                                     | extended |              | 
 calling_lastname      | text                        |                                                     | extended |              | 
 foreign_userId        | text                        |                                                     | extended |              | 
 local_rate_plan_id    | integer                     |                                                     | plain    |              | 
 national_rate_plan_id | integer                     |                                                     | plain    |              | 
 world_rate_plan_id    | integer                     |                                                     | plain    |              | 
 foreign_pid           | integer                     |                                                     | plain    |              | 
 rate_center_id        | integer                     |                                                     | plain    |              | 
 human_validation_date | timestamp without time zone |                                                     | plain    |              | 
 service_type          | character varying           |                                                     | extended |              | 
 line_of_business      | character varying           |                                                     | extended |              | 
 incoming_toll         | boolean                     | default false                                       | plain    |              | 
Indexes:
    "phones_pkey" PRIMARY KEY, btree (id)
    "index_phones_on_id" UNIQUE, btree (id)
    "index_phones_on_btnid" btree (btn_id)
    "index_phones_on_extension" btree (extension)
    "index_phones_on_foreign_pid" btree (foreign_pid)

Solution

  • pg_column_size does not measure the size of the toasted value, only the size of the toast pointer.

    But you can use that to your advantage by comparing its output with the actual column size that you can get by using length().

    It could be the varchar(255) columns, but the prime suspects are the text and character varying (without a length) columns.

    If you find nothing that way, there is also the possibility of bloat in the toast table. Use the pgstattuple extension to measure that.