Search code examples
dockercassandracassandra-4.0

How to enable full query logging on a Cassandra 4.0 Docker container?


I'd like to run a Docker container running Cassandra 4 with Full Query Logging (FQL) enabled. So far I've tried to build the following Dockerfile:

FROM cassandra:4.0
RUN nodetool enablefullquerylog

but this fails with the following error:

nodetool: Failed to connect to '127.0.0.1:7199' - ConnectException: 'Connection refused (Connection refused)'.

I've also tried to to uncomment the full_query_logging_options in the cassandra.yaml located in /etc/cassandra/cassandra.yaml on the Docker container:

# default options for full query logging - these can be overridden from command line when executing
# nodetool enablefullquerylog
full_query_logging_options:
    log_dir: /var/log/cassandra/fql.log
    roll_cycle: HOURLY
    block: true
    max_queue_weight: 268435456 # 256 MiB
    max_log_size: 17179869184 # 16 GiB
    # archive command is "/path/to/script.sh %path" where %path is replaced with the file being rolled:
    archive_command:
    max_archive_retries: 10

Ideally, I would like to enable FQL in the cassandra.yaml without having to use a nodetool command, but it seems that this is not possible (it is only possible to configure its options given that it has been enabled using nodetool)?

I'm also unsure how to change the cassandra.yaml in order to allow nodetool to connect. I've noticed that in the cassandra Docker image which runs Cassandra 3, the nodetool command works; it just doesn't work in the cassandra:4.0 image. From Cassandra failed to connect, it seems that what is needed is to configure the listen_address and broadcast_address in the cassandra.yaml. In the Cassandra 3 Docker container, I can see this is configured by default as follows:

# Address or interface to bind to and tell other Cassandra nodes to connect to.
# You _must_ change this if you want multiple nodes to be able to communicate!
#
# Set listen_address OR listen_interface, not both.
#
# Leaving it blank leaves it up to InetAddress.getLocalHost(). This
# will always do the Right Thing _if_ the node is properly configured
# (hostname, name resolution, etc), and the Right Thing is to use the
# address associated with the hostname (it might not be).
#
# Setting listen_address to 0.0.0.0 is always wrong.
#
listen_address: 172.17.0.5

# Set listen_address OR listen_interface, not both. Interfaces must correspond
# to a single address, IP aliasing is not supported.
# listen_interface: eth0

# If you choose to specify the interface by name and the interface has an ipv4 and an ipv6 address
# you can specify which should be chosen using listen_interface_prefer_ipv6. If false the first ipv4
# address will be used. If true the first ipv6 address will be used. Defaults to false preferring
# ipv4. If there is only one address it will be selected regardless of ipv4/ipv6.
# listen_interface_prefer_ipv6: false

# Address to broadcast to other Cassandra nodes
# Leaving this blank will set it to the same value as listen_address
broadcast_address: 172.17.0.5

whereas in the Cassandra 4 container it is

# Address or interface to bind to and tell other Cassandra nodes to connect to.
# You _must_ change this if you want multiple nodes to be able to communicate!
#
# Set listen_address OR listen_interface, not both.
#
# Leaving it blank leaves it up to InetAddress.getLocalHost(). This
# will always do the Right Thing _if_ the node is properly configured
# (hostname, name resolution, etc), and the Right Thing is to use the
# address associated with the hostname (it might not be). If unresolvable
# it will fall back to InetAddress.getLoopbackAddress(), which is wrong for production systems.
#
# Setting listen_address to 0.0.0.0 is always wrong.
#
listen_address: localhost

# Set listen_address OR listen_interface, not both. Interfaces must correspond
# to a single address, IP aliasing is not supported.
# listen_interface: eth0

# If you choose to specify the interface by name and the interface has an ipv4 and an ipv6 address
# you can specify which should be chosen using listen_interface_prefer_ipv6. If false the first ipv4
# address will be used. If true the first ipv6 address will be used. Defaults to false preferring
# ipv4. If there is only one address it will be selected regardless of ipv4/ipv6.
# listen_interface_prefer_ipv6: false

# Address to broadcast to other Cassandra nodes
# Leaving this blank will set it to the same value as listen_address
# broadcast_address: 1.2.3.4

I don't quite understand where the 172.17.0.5 'comes from' and why setting it to that value in the cassandra.yaml would allow nodetool to work on the container. Any ideas how to use nodetool on the Cassandra 4 container to enable FQL?


Solution

  • It turns out that by default, you cannot run nodetool commands in the Dockerfile when building the container; rather, they have to be run 'manually' in the running container. So I adapted the Dockerfile to the following:

    FROM cassandra:4.0
    RUN mkdir /cassandra-fql && chmod 777 /cassandra-fql
    COPY cassandra.yaml /etc/cassandra/cassandra.yaml
    

    with the cassandra.yaml the same as the default one except for the following full_query_logging_options:

    # default options for full query logging - these can be overridden from command line when executing
    # nodetool enablefullquerylog
    full_query_logging_options:
        log_dir: /cassandra-fql
        roll_cycle: HOURLY
        block: true
        max_queue_weight: 268435456 # 256 MiB
        max_log_size: 17179869184 # 16 GiB
        # archive command is "/path/to/script.sh %path" where %path is replaced with the file being rolled:
        # archive_command:
        max_archive_retries: 10
    

    Then, after running the container like so,

    docker run --name cassandra-fql -p 127.0.0.1:9042:9042 cassandra-fql
    

    and docker execing into it, running nodetool enablefullquerylog was successful.