How to run Wordpress served in a subdirectory via a Docker cluster?

I needed to migrate our aging host-level single Wordpress server into our production cluster, which uses EC2 autoscaled hosts and Docker. Although I found some articles that helped, there were a lot of gaps and inconsistencies, and I ran into some unexpected issues that I haven't seen documented that I believe will be common in a clustered and complex environment. So I'm leaving this self-answered Q&A:

How do I migrate Wordpress from a Linux server to a clustered Docker environment?
How do I configure the standard Wordpress Docker image to serve from a subdirectory routed from my reverse proxy?
Ok, it's up. I can log into wp-admin in our dev cluster, but not in production!
Argh! Why is the REST API broken?

I'm going to assume you're more "devops" than "blog admin", and are already reasonably familiar with Docker, NFS, ssh, your strategy for running containers on multiple hosts, and how you might set up your reverse proxy to route to a particular group of Wordpress backends given a path prefix like /blog.

Solution

Migration

I won't dive into this too deeply, as this is the one part that is well documented elsewhere. The TL;DR is "find the wp-config.php file to get the definitions, and then you only need to archive the wp-content directory and extract the contents of the database. Use the variables you got from the config. (For future import into RDS, I needed to add some command line options not found in simpler mysqldump examples or else I got permission errors around GTID issues.)

# tar czvf wp-content.tgz wp-content
# mysqldump -h <bloghost> -u <bloguser> -p \
  --column-statistics=0 --no-tablespaces --set-gtid-purged=OFF \
  --databases <dbname> | gzip > blog.sql.gz

For hosting, I have a /shared EFS mount that can be used for my Docker containers. You'll want something similar in a clustered environment.

(For the purposes of this answer, I'll use /shared as a shared network file system, and I'll use /blog as my subdirectory, and various <VAR> values for you to replace with your own values. I'm using Docker/EC2/EFS/RDS, but the instructions should still be relevant to other similar cloud clustered environments.)

I created a directory /shared/wordpress/blog directory and extracted the tarball into this hierarchy, yielding /shared/wordpress/blog/wp-content. Note that this is down a level and is in a directory that will host the entire subdirectory I want to serve later. This is important!

% sudo -s
# mkdir -p /shared/wordpress/blog
# cd /shared/wordpress/blog
# scp <cluster-user>@<old-host-of-blog>:/var/www/html/blog/blog.sql.gz .
# scp <cluster-user>@<old-host-of-blog>:/var/www/html/blog/wp-content.tgz .
# tar xzvf wp-content.tgz
# zcat blog.sql.gz | mysql -u <bloguser> -p -h <new-mysql-address>

Getting the basics running, but broken:

Note: putting authentication info directly into your compose files is a bad practice; use more rigorous secrets management once you have things working!

You'll want to follow any of the recipes for running wordpress:latest using docker-compose or whatever your preferred toolchain is. The way this image works is that it dynamically generates a wp-config.php file internally that reads some configuration options via named environment variables, but notably excludes some of the configuration that we'll need in order to fully set things up. Add the following obvious environment variables: WORDPRESS_DB_HOST, WORDPRESS_DB_USER, WORDPRESS_DB_PASSWORD, and then one big yucky override for stuff that doesn't have named environment vars:

WORDPRESS_CONFIG_EXTRA="define('WP_HOME', 'https://www.yoursite.com/blog/'); define('WP_SITEURL', 'https://www.yoursite.com/blog/');"

The above settings are key to serving out of a subdirectory!

You'll now want to ensure that you add a mount point for the parent of the wp-content directory that you're sharing over NFS. In a docker-compose.yml you'd add something like:

volumes:
- /shared/wordpress/blog/wp-content:/var/www/html/blog/wp-content

When you run this (and assuming that you have your /blog directory routed to your containerized backends) you should now be able to visit your Wordpress instance. Sort of. Perhaps you have a redirect loop. Maybe you have some partially broken content. (It depends a bit on what you had set up previously.)

If you ssh to the host and look at the running container (i.e. via docker exec -t -i <running_image_name> /bin/bash) you'll discover that the /var/www/html directory has been populated with all sorts of stuff. This isn't quite right, we want the /var/www/html/blog directory to hold the site, because the proxy won't ever route to the parent directory!

Subdirectory

Unfortunately, there's no runtime-configurable way to make this happen.

We need to make a custom Docker image. It's annoying, but at least the Dockerfile is trivial:

FROM wordpress:latest
WORKDIR /var/www/html/blog

ENTRYPOINT ["docker-entrypoint.sh"]
CMD ["apache2-foreground"]

Changing the WORKDIR will cause the script to write into our subdirectory instead of the default.

Build the image, update your docker-compose.yml or k8s or whatever to use it, and BOOM you're up and running!

Mostly.

I can't log into `/blog/wp-admin`!

Maybe it works when you have only one server, but your production cluster with multiple instances doesn't let you log in.

So here's the thing, the generated wp-config.php file contains a block of unique generated keys/salts that are used for cookie generation. If you have multiple independent wordpress instances running that have all generated their own keys, each will generate login cookies that the others reject!

Fortunately, the config allows you to specify environment variables for all of them!

Here's the snippet from wp-config.php where you can see their names:

define( 'AUTH_KEY',         getenv_docker('WORDPRESS_AUTH_KEY',         'xxx') );
define( 'SECURE_AUTH_KEY',  getenv_docker('WORDPRESS_SECURE_AUTH_KEY',  'xxx') );
define( 'LOGGED_IN_KEY',    getenv_docker('WORDPRESS_LOGGED_IN_KEY',    'xxx') );
define( 'NONCE_KEY',        getenv_docker('WORDPRESS_NONCE_KEY',        'xxx') );
define( 'AUTH_SALT',        getenv_docker('WORDPRESS_AUTH_SALT',        'xxx') );
define( 'SECURE_AUTH_SALT', getenv_docker('WORDPRESS_SECURE_AUTH_SALT', 'xxx') );
define( 'LOGGED_IN_SALT',   getenv_docker('WORDPRESS_LOGGED_IN_SALT',   'xxx') );
define( 'NONCE_SALT',       getenv_docker('WORDPRESS_NONCE_SALT',       'xxx') );

These values need to be identical between all running wordpress instances in your cluster, so add them to your docker configuration and restart.

The REST API doesn't work!

The simple answer most people cite here is to log into the /blog/wp-admin console and go to Settings > Permalinks. Assuming you already had this set up and working, all you should need to do now is do nothing but click "Save", and this will (handwave handwave) "update the .htaccess file".

The problem with this solution is that whatever it is doing to .htaccess is only going to modify that file for the single instance that you happened to be connected to for administration! It won't work for a clustered environment.

Let's look at what it actually does. When you first start up, the /var/www/html/blog/.htaccess file contains:

# BEGIN WordPress

RewriteEngine On
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]

After you visit the permalinks settings and save, it turns into:

# BEGIN WordPress
# The directives (lines) between "BEGIN WordPress" and "END WordPress" are
# dynamically generated, and should only be modified via WordPress filters.
# Any changes to the directives between these markers will be overwritten.
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteRule .* - [E=HTTP_AUTHORIZATION:%{HTTP:Authorization}]
RewriteBase /blog/
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /blog/index.php [L]
</IfModule>

# END WordPress

Never mind the REST API, this is alarming! It seemed to be pre-configured to assume no subdirectory, and none of the prior steps for getting the subdirectory working changed that.

So, we definitely need to ensure this file is configured correctly. The easy answer is to put it outside the container in our shared directory, and mount it. Paste the above file (tweak the /blog to your own subdirectory if necessary) to /shared/wordpress/blog/.htaccess and add a mount to the docker configuration:

volumes:
- /shared/wordpress/blog/wp-content:/var/www/html/blog/wp-content
- /shared/wordpress/blog/.htaccess:/var/www/html/blog/.htaccess

Restart everything, and celebrate your new Wordpress cluster!

Alternative to the last two steps

Since our goal was to share the keys/salts from wp-config.php and to also share the .htaccess file, an alternate approach is to simply mount one directory level up. So, replace the mount points with only:

volumes:
- /shared/wordpress/blog:/var/www/html/blog

This will cause the entire generated Wordpress hierarchy to be written to the network share instead of internally with selected overrides.

In practice, this is the approach I took, but I may consider migrating to the "many vars + .htaccess" approach documented above.

Note that if multiple instances start up simultaneously sharing the entire tree, and try to bootstrap this hierarchy, they do interfere with each other! Some simply exited and then successfully restarted. To prevent the risk of possible file corruption or other weirdness, you might want to babysit the first startup to ensure only one instance starts and initializes the hierarchy.