How would I configure analyze threshold for a table where the data is categorically different every couple months?

We host data for an auditing service. Every few months, a new audit comes out with similar questions to previous audits of the same category. Since questions can change verbiage and number, we store each question in each audit separately (we do link them through a "related_questions" table).

audits

id	name	passing_score
1	audit_1	100
2	audit_2	150

questions

id	audit_id	text
1	1	q1
2	1	q2
3	2	q1
4	2	q2

We then have a surveys and responses table. Surveys are the overall response to an audit, while responses store the individual responses to each question.

surveys

id	audit_id	overall_score	pass
1	1	120	true
2	1	95	false
3	2	200	true
4	2	100	false

responses

id	survey_id	question_id	score
1	1	1	60
2	1	2	60
3	2	1	60
4	2	2	35
5	3	3	100
6	3	4	100
7	4	3	50
8	4	4	50

The analyze threshold is base threshold + scale factor * number of tuples. The problem with this is that once an audit has finished (after a few months), we'll never receive new surveys or responses for that category of data. The new data that comes in is conceptually all that needs to be analyzed. All data is queried, but the new data has the most traffic.

If 10% is the ideal scale factor for today and analyze autoruns once every week, a couple years from now analyze may autorun once every 4 months due to the number of tuples. This is problematic when the past 3 months of data is for questions that the analyzer has never seen and so there are no helpful stats for the query planner on this data.

We could set the scale factor extremely low for this table, but that seems like a cheap solution that could cause issues in the future.

Solution

If you have a constant data modifications rate, setting autovacuum_analyze_scale_factor to 0 for that table and only using autovacuum_analyze_threshold is a good idea.