I was using hstore, Postgresql 9.3.4, to store a count for each time an event happened in a given day, with an update like the following.
days_count = days_count || hstore('x', (coalesce((days_count -> 'x')::integer, 0) + 1)::text)
Where x is the day of the year. After running a simulation of expected behavior for production I ended up with a table that was 150MB + 2GB Toast + 25-30MB for the index, after Analyze and Vacuum.
I am now instead breaking up the above column into one for each month like the following
y_month_days_count = y_month_days_count || hstore('x', (coalesce((y_month_days_count -> 'x')::integer, 0) + 1)::text)
Where x is the day of the month, and y is the month of the year. I am still running the simulation right now, but so far at third of the way done I am at 60MB + A pretty steady 20-30MB of Toast + 25-30MB for the index. Which means in the end I should end up with about 180MB + 30-40MB for Toast + 25MB-30MB for the index after Analyze and Vacuum.
So first is there any known issues with Hstore and Toast bloat that would explain my issue with my first set up?
Second will my current solution of breaking up the columns cause any type of issues with hstore and performance in the future because of the number of hstore columns on one table? It seems to be steady now with row numbers in the hundred of thousands, and while I know more columns can make things slower, I am unsure if this is worse with hstore columns.
Finally I did find something out. I have one hstore column that ends up representing each hour a day, so it has 24 different keys. When I run the simulation for just this column I end up with almost no toast, in the KB, but when I run the whole simulation, with the days broken up into months columns, my largest hstore has 52 keys.
So for a simple store of either a counter or a word or two, the max number of keys before I see any amount of toast for hstore is between 24 and 52 keys.
So first is there any known issues with Hstore and Toast bloat that would explain my issue with my first set up?
Yes.
When you update any part of an out-of-line stored TOAST
ed field like text
, hstore
or json
the whole field must be re-written as a new row version. This is a consequence of MVCC - it's necessary to retain a copy of every version of the row that might still be visible to another transaction.
The old one can be vacuum
ed away when it's no longer required by any running transaction, so in practice this has minimal impact so long as autovacuum is running aggressively enough.
So if you're updating lots of rows with big text
, hstore
or json
fields, or updating them frequently, tune autovacuum up so it runs more often and does work faster. Make sure you don't have long running <IDLE> in transaction
connections.
You say the table sizes you quoted were "after analyze and vacuum" but I'm guessing you only ran a regular vacuum
, so the table bloat would've been freed for re-use by PostgreSQL but not released back to the OS. See if VACUUM FULL
compacts it.
Will my current solution of breaking up the columns cause any type of issues with hstore and performance in the future because of the number of hstore columns on one table?
Depends on your query patterns and workload, but probably not.