Search code examples
pythonpostgresqlprefect

Cleaning ~/.prefect/pg_data/ when using Prefect


I'm using Prefect to automatize my flows (python scripts). Once running, some data get persisted to a postgresql database, problem, the size of pg_data gets rapidely out of hands (~20Gb) and I was wondering if there was a way to reduce the amount of data stored to pg_data when running an agent or if there was a way to automatically clean the directory.

Thanks in advance for your help,

best,

Christian


Solution

  • I assume you are running Prefect Server and you want to clean up the underlying database instance to save space? If so, there are a couple of ways you can clean up the Postgres database:

    • you can manually delete old records, especially logs from the flow run table using DELETE FROM in SQL,
    • you can do the same in an automated fashion, e.g. some users have an actual flow that runs on schedule and purges old data from the database,
    • alternatively, you can use the open-source pg_cron job scheduler for Postgres to schedule such DB administration tasks,
    • you can also do the same using GraphQL: you would need to query for flow run IDs of "old" flow runs using the flow_run query, and then execute delete_flow_run mutation,
    • lastly, to be more proactive, you can reduce the number of logs you generate by generally logging less (only logging what's needed) and setting the log level to a lower category, e.g. instead of using DEBUG logs on your agent, switching to INFO should significantly reduce the amount of space consumed by logs in the database.