Context
For local development and testing within a CI pipeline, I want a postgres docker image that contains some data sampled from production (a few tens of MBs). I will periodically rebuild this image to ensure the sampled data stays fresh.
I don't care at all about data integrity, but I care quite a bit about image size and container disk/memory usage when run. Startup time should be at most a couple of mins.
What I've built
I have a docker file that builds on top of one of the official postgres (postgis) docker images, but it actually initializes the database and uses pg_restore
to insert my sample data.
Attempted optimising
I use a mutlistage build, just copying the postgres directory into the final image (this helps as I used node during the build).
I notice that the pg_xlog
directory is quite large, and logically seems redundant here since I would happily checkpoint and ditch any WAL before sealing the image. I can't figure out how to get rid of it. I tried starting postgres with the following flags:
-min_wal_size=2 --max_wal_size=3 --archive_mode=off --wal_keep_segments
and running Checkpoint
and waiting for a few seconds, but it doesn't seem to change anything. I also tried deleting the contents of the directory, but that seemed to break the database on its next startup.
Rather than put the actual database within the image, I could just leave a pg_dump
file in the image and have the image entrypoint build the database from that. I think this would improve the image size (though I'm not clear why the database should take up much more space than the dump, unless indexes are especially big - I actually thought the dump format was less compact than the database itself, so this might offset the index size). This would obviously impact on startup time (but not prohibitively so).
Summary/Questions
Am I going about this the right way? If so, what kind of disk/memory optimizations can I use? In particular can I remove/shrink pg_xlog
?
I'm using Postgres 9.5
and Postgis 2.X
.
Was the server ever run with a larger max_wal_size than 3? If so, it could have "recycled" ahead a lot of wal files by renaming old ones for future use. Once those are renamed, they will never be removed until after they are used, even if max_wal_size is later reduced.
I also tried deleting the contents of the directory, but that seemed to break the database on its next startup.
You can fix that by using pg_resetxlog. Just don't get in the habit of running that blindly, it is very dangerous to run outside of a test environment.