I am working on a research project that involves large video datasets (100s of GB, possibly multiple TB in the near future). I am fairly new to linux, sysadmin, and setting up servers, so please bear with me. I've provided quite a bit of info, and let me know if there is anything else that would be helpful.
I am using Ubuntu, Docker (w/ docker-compose), nginx, Python3.5 & django 1.10
Uploading a large-ish (60GB) dataset leads to the following error:
$ sudo docker-compose build
postgres uses an image, skipping
Building django
Step 1 : FROM python:3.5-onbuild
# Executing 3 build triggers...
Step 1 : COPY requirements.txt /usr/src/app/
---> Using cache
Step 1 : RUN pip install --no-cache-dir -r requirements.txt
---> Using cache
Step 1 : COPY . /usr/src/app
ERROR: Service 'django' failed to build: Error processing tar file(exit status 1): write /usr/src/app/media/packages/video_3/video/video_3.mkv: no space left on device
My files are on a drive with 500GB free, and the current dataset is only ~60GB.
I found this discussion on container size. Perhaps I am misunderstanding Docker, but I believe I just want my volumes to be larger, not the containers themselves, so this doesn't seem appropriate. It also doesn't use docker-compose, so I'm unclear how to implement it in my current setup.
Just to be clear, with help from this question I am able to serve static files & media files with a small test set of data. (unclear to me if they're serving from the django container or the nginx container, as the data appears in both containers via ssh)
How can I get my setup to handle this large amount of data? I would like to be able to upload additional data later, so if a solution exists that can do this without having to rebuild volumes all the time, that'd be swell.
Directory Structure
film_web
├── docker-compose.yml
├── Dockerfile
├── film_grammar
│ ├── #django code lives here
├── gunicorn_conf.py
├── media
│ ├── #media files live here
├── nginx
│ ├── Dockerfile
│ └── nginx.conf
├── requirements.txt
└── static
├── #static files live here
docker-compose.yml
nginx:
build: ./nginx
volumes:
- ./media:/usr/src/app/film_grammar/media
- ./static:/usr/src/app/film_grammar/static
links:
- django
ports:
- "80:80"
volumes_from:
- django
django:
build: .
volumes:
- ./film_grammar:/usr/src/app/film_grammar
expose:
- "8000"
links:
- postgres
postgres:
image: postgres:9.3
film_web Dockerfile
From python:3.5-onbuild
ENV DJANGO_CONFIGURATION Docker
CMD ["gunicorn", "-c", "gunicorn_conf.py", "--chdir", "film_grammar", "fg.wsgi:application", "--reload"]
VOLUME /home/alexhall/www/film_web/static
VOLUME /home/alexhall/www/film_web/media
nginx Dockerfile:
FROM nginx
COPY nginx.conf /etc/nginx/nginx.conf
nginx.conf
worker_processes 1;
events {
worker_connections 1024;
}
http {
include /etc/nginx/mime.types;
server {
listen 80;
server_name film_grammar_server;
access_log /dev/stdout;
error_log /dev/stdout info;
location /static {
alias /usr/src/app/film_grammar/static/;
}
location /media {
alias /usr/src/app/film_grammar/media/;
}
location / {
proxy_pass http://django:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Host $server_name;
}
}
}
Thanks in advance for your help!
build
starts by creating a tarball from the context directory (in your case .
) and sending that tarball to the server. The tarball is created in the tmp directory I believe, which is probably why you're running out of space when trying to build.
When you're working with large datasets the recommended approach is to use a volume. You can use a bind mount volume to mount the files from the host.
Since you're providing the data using a volume, you'll want to exclude it from the image context. To do this create a .dockerignore
in the .
directory. In that file add all the paths with large data (.git
, media
, static
).
Once you've ignored the large directories the build should work.