Search code examples
postgresqldatabase-administration

How would I configure analyze threshold for a table where the data is categorically different every couple months?


We host data for an auditing service. Every few months, a new audit comes out with similar questions to previous audits of the same category. Since questions can change verbiage and number, we store each question in each audit separately (we do link them through a "related_questions" table).

audits

id name passing_score
1 audit_1 100
2 audit_2 150

questions

id audit_id text
1 1 q1
2 1 q2
3 2 q1
4 2 q2

We then have a surveys and responses table. Surveys are the overall response to an audit, while responses store the individual responses to each question.

surveys

id audit_id overall_score pass
1 1 120 true
2 1 95 false
3 2 200 true
4 2 100 false

responses

id survey_id question_id score
1 1 1 60
2 1 2 60
3 2 1 60
4 2 2 35
5 3 3 100
6 3 4 100
7 4 3 50
8 4 4 50

The analyze threshold is base threshold + scale factor * number of tuples. The problem with this is that once an audit has finished (after a few months), we'll never receive new surveys or responses for that category of data. The new data that comes in is conceptually all that needs to be analyzed. All data is queried, but the new data has the most traffic.

If 10% is the ideal scale factor for today and analyze autoruns once every week, a couple years from now analyze may autorun once every 4 months due to the number of tuples. This is problematic when the past 3 months of data is for questions that the analyzer has never seen and so there are no helpful stats for the query planner on this data.

We could set the scale factor extremely low for this table, but that seems like a cheap solution that could cause issues in the future.


Solution

  • If you have a constant data modifications rate, setting autovacuum_analyze_scale_factor to 0 for that table and only using autovacuum_analyze_threshold is a good idea.