Search code examples
pythondjangopostgresqldockeramazon-ecs

PostgreSQL on AWS ECS: psycopg2.OperationalError invalid port number 5432


I'm having a problem with database connection via psycopg2 on AWS ECS. I have an App container and a DB container. Containers are linked.

The App has an entrypoint script that checks if the DB is up before starting an app server.

$ until psql -h "$DB_HOST" -U "$DB_USER" -c '' && >&2 echo "Postgres is up"; do
    >&2 echo "Postgres is unavailable - sleeping"
    sleep 1
done

> Is the server running on host "db" (172.17.0.3) and accepting
> TCP/IP connections on port 5432?
> Postgres is unavailable - sleeping
> Postgres is up

This part works fine, but as soon as the app server starts and tries to connect to the DB I get the following error:

psycopg2.OperationalError: invalid port number: "tcp://172.17.0.3:5432"

I have no idea what could be the case. This works fine when running locally with Docker.

Any hints would be appreciated. Thanks!


Solution

  • I came across the same problem with Ruby on Rails. I had pretty much the same database config and I too used two linked containers for app and database (not directly, but through the Gitlab CI; under the hood it creates containers and links them). My environment variables had different names though: POSTGRES_HOST, POSTGRES_PORT, etc. Yet, your solution of explicitly defining POSTGRES_PORT worked for me too! But I couldn't leave it like that, I wanted to figure out why that helped and what caused the problem in the first place. So here's what I found.

    The error says: invalid port number: "tcp://172.17.0.3:5432". At first it may seem like a valid port 5432, but actually it is the whole string "tcp://172.17.0.3:5432" that isn't a valid port number. Something passed this URI instead of a port number to PostgreSQL, that's what the error says. You connect via psycopg, I used pg gem, but both of them are wrappers around libpq C library, a part of PostgreSQL. Let's take a look at it to see how we're getting this error. There is a file fe-connect.c which contains functions that parse connection options. And here's the relevant code (from PostgreSQL 10, that's the version I used):

    /* Figure out the port number we're going to use. */
    if (ch->port == NULL || ch->port[0] == '\0')
            thisport = DEF_PGPORT;
    else
    {
            thisport = atoi(ch->port);
            if (thisport < 1 || thisport > 65535)
            {
                    appendPQExpBuffer(&conn->errorMessage,
                                      libpq_gettext("invalid port number: \"%s\"\n"),
                                      ch->port);
                    goto keep_going;
            }
    }
    

    It says: if ch->port is NULL or an empty string it means no port was provided with connection options, then let's use DEF_PGPORT, the precompiled default port, usually 5432; and if ch->port is present, let's convert it to int with atoi and check if it's between 1 and 65535.

    If ch->port is "tcp://172.17.0.3:5432", atoi(ch->port) returns 0, it's less than 1, so that's how we get this error.

    By the way, in recent PostgreSQL versions one would get a more informative error: invalid integer value "tcp://172.17.0.3:5432" for keyword "port". That's because this commit replaced the above atoi with a custom error-checking string conversion function.

    Okay, this URI appears in place of a port number in libpq connection options. But how it gets there? Turns out, because of Docker.

    Docker containers can have names, either auto generated or provided with --name option to the run command. When you use --link option to link two containers, you specify the other container's name and optionally an alias. By default the alias is the same as the name. Probably, your database container had a name/alias db, mine was named postgres (Gitlab by default names a container with its image name, in my case: postgres).

    When you link containers, Docker defines a bunch of environment variables, these variables are named based on container names/aliases. One of the variables is <alias>_PORT, it contains the URI of the container’s exposed port. Not just port number, but a complete URI (like the one you'd get from docker port <alias> command). Here's where you get "tcp://172.17.0.3:5432" from, it is written by Docker to the DB_PORT variable, because your database container happens to be named db.

    After all, possible solutions are:

    • redefine DB_PORT variable after Docker links the containers (as you did),
    • rename DB_PORT variable in the config,
    • set another alias for the db container.