I am trying to run a spider with portia in its docker version but i don't want to execute the spider using a terminal command like docker exec ... portiacrawl ...
. Is there any way I can run the spider, that is already created, by making a request at its localhost port and save it in an specific folder?
Something like: https://localhost:9001/execute/spider_name/folder_path
Example of my own usage:
First what I do is run the container and leave it running, because i cant stop it for other reasons:
docker run -i -t -d --rm -v <PROJECTS_FOLDER>:/app/data/projects:rw -p 9001:9001 scrapinghub/portia
Next I execute the portiacrawl:
docker exec <CONTAINER_ID> portiacrawl <PROJECT_NAME_PATH> <SPIDER_NAME> -o /some/path/in/my/pc/<SPIDER_NAME>.json
Now, what i want is to replace the docker exec step with som http request to the localhost server that is running.
Thanks very much for your time
Yes, you can by doing a port mapping. While starting a docker container you wont have any ports published publicly or exposed internally unless you told docker to do so.
For example:
if you wish to expose a port internally (inside the docker network itself, you need to add EXPOSE
in the dockerfile)
if you wish to publish a port publicly that can be access either through localhost or the public ip you can use -p
option along with passing the ports so in your case it will be like this:
docker run -p 9001:9001 imagename
The command above will tell docker that you would like to do port mapping from 9001 (using localhost or any other interface) to 9001 (inside the container and you can change the ports according to your actual setup).
If you wish to expose it to localhost only you can change the command to something like this:
docker run -p 127.0.0.1:9001:9001 imagename
For more information check the following docs
According to the updated question, the other and safest way to accomplish this will be implementing an API inside portiacrawl
that can be called through HTTP to do the needed tasks instead of using docker exec