Search code examples
dockerdocker-composenutch

Apache Nutch doesn't expose its API


I'm trying to use Apache Nutch 1.x Rest API. I use docker images to set up Nutch and Solr. You can see the demo repo in here

Apache Nutch uses Solr as its dependents. Solr works great, I'm able to reach its GUI at localhost:8983.

However, I cannot reach Apache Nutch's API at localhost:8081. The problem starts here. The Apache Nutch 1.X RESTAPI doc indicates that I can start the server like this 2. :~$ bin/nutch startserver -port <port_number> [If the port option is not mentioned then by default the server starts on port 8081]

Which I am doing in docker-compose.yml file. I'm also exposing the ports to the outside.

    ports:
       - "8080:8080"
       - "8081:8081"

But I wasn't able to successfully call the API from my computer.

The rest API documentation says that if I send a get request to /admin endpoint, I would get a response.

GET /admin

When I try this with Postman or from the browser, it cannot reach out to the server and gives me back a 500 error.

However, when I get inside of the container with docker exec -it and try to curl localhost:8081/admin, I get the correct response. So within the container the API is up and running, but it is not exposed to outside.

In one of my tryouts, I have added a frontend application in another container and send rest requests to Solr and Nutch containers. Solr worked, Nutch failed with 500. This tells me that Nutch container is not only unreachable to the outside world, it is also unreachable to the containers within the same network.

Any idea how to workaround this problem?


Solution

  • nutch by default only reply to requests from localhost:

    bash-5.1# /root/nutch/bin/nutch startserver -help
    usage: NutchServer [-help] [-host <host>] [-port <port>]
     -help          Show this help
     -host <host>   The host to bind the Nutch Server to. Default is
                    localhost.
    

    So you need to start it with -host 0.0.0.0 to be able to reach it from the host machine or another container:

    services:
      nutch:
        image: 'apache/nutch:latest'
        command: '/root/nutch/bin/nutch startserver -port 8081 -host 0.0.0.0'