Search code examples
postgresql

Vacuum analyze all tables in a schema postgres


I have a very large postgres database that has one particular schema in it which is dropped in and recreated nightly. After all of the tables in that schema are created I want to vacuum analyze them, however the database is so large that if a do a full db VACUUM ANALYZE; it takes about a half hour.

How can I go about vacuum analyzing each of the tables in this schema only without writing a separate SQL command for each table?


Solution

  • The bash function below utilizes the CLI tool psql to vacuum analyze tables in a single schema which can be identified by either passing the name of the schema as the first parameter to the function or setting the environment variable PG_SCHEMA:

    vacuum_analyze_schema() {
        # vacuum analyze only the tables in the specified schema
    
        # postgres info can be supplied by either passing it as parameters to this
        # function, setting environment variables or a combination of the two
        local pg_schema="${1:-${PG_SCHEMA}}"
        local pg_db="${2:-${PG_DB}}"
        local pg_user="${3:-${PG_USER}}"
        local pg_host="${4:-${PG_HOST}}"
    
        echo "Vacuuming schema \`${pg_schema}\`:"
    
        # extract schema table names from psql output and put them in a bash array
        local psql_tbls="\dt ${pg_schema}.*"
        local sed_str="s/${pg_schema}\s+\|\s+(\w+)\s+\|.*/\1/p"
        local table_names=$( echo "${psql_tbls}" | psql -d "${pg_db}" -U "${pg_user}" -h "${pg_host}"  | sed -nr "${sed_str}" )
        local tables_array=( $( echo "${table_names}" | tr '\n' ' ' ) )
    
        # loop through the table names creating and executing a vacuum
        # command for each one
        for t in "${tables_array[@]}"; do
            echo "doing table \`${t}\`..."
            psql -d "${pg_db}" -U "${pg_user}" -h "${pg_host}" \
                -c "VACUUM (ANALYZE) ${pg_schema}.${t};"
        done
    }
    

    This function can be added to your .bashrc to provide the ability to invoke it from the command line at any time. Like the schema, Postgres connection and database values can be set by either supplying them as function parameters:

    # params must be in this order
    vacuum_analyze_schema '<your-pg-schema>' '<your-pg-db>' '<your-pg-user>' '<your-pg-host>'
    

    or by setting environment variables:

    PG_SCHEMA='<your-pg-schema>'
    PG_USER='<your-pg-user>'
    PG_HOST='<your-pg-host>'
    PG_DB='<your-pg-db>'
    
    vacuum_analyze_schema
    

    or by a combination of both. Values passed as params will take precedence over corresponding environment vars.