I've been impressed by the performance improvements achieved with clustering, but not with how long it takes.
I know clustering needs to be rebuilt if a table or partition is changed after the clustering, but unless I've made a note of when I last clustered a table, how can I tell when I need to do it again?
I can use this query to tell me what table(s) have one or more clustered indexes
SELECT *
FROM pg_class c
JOIN pg_index i ON i.indrelid = c.oid
WHERE relkind = 'r' AND relhasindex AND i.indisclustered
My questions are.
I've noticed that it takes just as long to re-build a clustered index as it does to build it in the first place (even if the table hasn't been touched in the meantime). So I want to avoid re-clustering unless I know the table needs it.
UPDATE for clarity (I hope)
If I use this command....
CLUSTER tableA USING tableA_idx1;
To tell which index was last used to cluster the table, use the pg_index
system catalog.
Query the table for all indexes that belong to your table and see which one has indisclustered
set. A table can only be clustered by a single index at a time.
There is no way to find out when the table was last clustered, but that's not very interesting anyway. What you want to know is how good the clustering still is.
To find that, query the pg_stats
line for the column on which you clustered. If correlation
is close to 1, you are still good. The smaller the value gets, the more clustering is indicated.