Search code examples
djangodockercron

cronjob in docker container cannot connect to other container


I want to use cronjob to run a script, which is to fetch data from news api and feed it into postegres which is located in other container.

so the simplified architecture is

 app(in container) -> postegres(in container)

the cronjob script is inside app, and it will fetch data and then send to postegres.

in my crontab is

* * * * * cd /tourMamaRoot/tourMama/cronjob && fetch_news.py >> /var/log/cron.log 2>&1

i can run it successfully by manually run the script, but when i put it in crontab , it shows the error.

 File "/usr/local/lib/python3.6/dist-packages/django/db/backends/base/base.py", line 195, in connect
    self.connection = self.get_new_connection(conn_params)
  File "/usr/local/lib/python3.6/dist-packages/django/db/backends/postgresql/base.py", line 178, in get_new_connection
    connection = Database.connect(**conn_params)
  File "/usr/local/lib/python3.6/dist-packages/psycopg2/__init__.py", line 126, in connect
    conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
django.db.utils.OperationalError: could not connect to server: No such file or directory
    Is the server running locally and accepting
    connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?

seems it only look for database locally if use crontab, how can i set it to put data into other container like i manually run the script?

Info:

my docker container for app is Ubuntu version 18.04 , and the following is my docker file for app

FROM ubuntu:18.04
MAINTAINER Eson

ENV PYTHONUNBUFFERED 1
ENV DEBIAN_FRONTEND=noninteractive

EXPOSE 8000

# Setup directory structure
RUN mkdir /tourMamaRoot
WORKDIR /tourMamaRoot/tourMama/

COPY tourMama/requirements/base.txt /tourMamaRoot/base.txt
COPY tourMama/requirements/dev.txt /tourMamaRoot/requirements.txt

# install Python 3
RUN apt-get update && apt-get install -y \
        software-properties-common
RUN add-apt-repository ppa:deadsnakes/ppa
RUN apt-get update && apt-get install -y \
    python3.7 \
    python3-pip
RUN python3.7 -m pip install pip
RUN apt-get update && apt-get install -y \
    python3-distutils \
    python3-setuptools

# install Postgresql
RUN apt-get -y install wget ca-certificates
RUN wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add -
RUN sh -c echo deb http://apt.postgresql.org/pub/repos/apt/ `lsb_release -cs`-pgdg main >> /etc/apt/sources.list.d/pgdg.list
RUN apt-get update
RUN apt-get install -y postgresql postgresql-contrib

# Install some dep
RUN apt-get install net-tools
RUN apt-get install -y libpq-dev python-dev

RUN pip3 install -r /tourMamaRoot/requirements.txt

# Copy application
COPY ./tourMama/ /tourMamaRoot/tourMama/

docker compose file:

version: '3'

services:
  app:
    build:
      # current directory
      # if for dev, need to have Dockerfile.dev in folder
      dockerfile: docker/dev/Dockerfile
      context: .
    ports:
      #host to image
      - "8000:8000"
    volumes:
      # map directory to image, which means if something changed in
      # current directory, it will automatically reflect on image,
      # don't need to restart docker to get the changes into effect
      - ./tourMama:/tourMamaRoot/tourMama
    command: >
      sh -c "python3 manage.py wait_for_db &&
             python3 manage.py makemigrations &&
             python3 manage.py migrate &&
             python3 manage.py runserver 0.0.0.0:8000 &&
             sh initial_all.sh"
    environment:
      - DB_HOST=db
      - DB_NAME=app
      - DB_USER=postgres
      - DB_PASS=supersecretpassword

    depends_on:
      - db
      - redis

  db:
    image: postgres:11-alpine
    ports:
      #host to image
      - "5432:5432"
    environment:
      - POSTGRES_DB=app
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=supersecretpassword

  redis:
    image: redis:5.0.5-alpine
    ports:
      #host to image
      - "6379:6379"

#    command: ["redis-server", "--appendonly", "yes"]
#    hostname: redis
#    networks:
#      - redis-net
#    volumes:
#      - redis-data:/data

and my cronjob script is:

import os
import sys
import django
from django.db import IntegrityError
from newsapi.newsapi_client import NewsApiClient
sys.path.append("../")
os.environ.setdefault("DJANGO_SETTINGS_MODULE", "tourMama.settings")
django.setup()
from news.models import News
from tourMama_app.models import Category
from config.script import categorization_loader

load_category = categorization_loader.load_category_data("catagorization.yml")
categories = list(load_category.keys())
countries = ["us", "gb"]

# Init
newsapi = NewsApiClient(api_key='secret')

for category in categories:
    for country in countries:
        category_lower = category.lower()
        category_obj = Category.objects.filter(
            category=category,
        ).get()

        top_headlines = newsapi.get_top_headlines(q='',
                                                  # sources=object'bbc-news,the-verge',
                                                  category=category_lower,
                                                  language='en',
                                                  page_size=100,
                                                  country=country
                                                  )

        for article in top_headlines.get("articles"):
            try:
                News.objects.create(
                    source=article["source"].get("name") if article["source"] else None,
                    title=article.get("title"),
                    author=article.get("author"),
                    description=article.get("description"),
                    url=article.get("url"),
                    urlToImage=article.get("urlToImage"),
                    published_at=article.get("publishedAt"),
                    content=article.get("content"),
                    category=category_obj
                )

            except IntegrityError:
                print("data already exist")

            else:
                print("data insert successfully")

and if needed, my django setting file is as following:

"""
Django settings for tourMama project.

Generated by 'django-admin startproject' using Django 2.2.1.

For more information on this file, see
https://docs.djangoproject.com/en/2.2/topics/settings/

For the full list of settings and their values, see
https://docs.djangoproject.com/en/2.2/ref/settings/
"""

import os

# Build paths inside the project like this: os.path.join(BASE_DIR, ...)
BASE_DIR = os.path.dirname(os.path.dirname(os.path.abspath(__file__)))
TEMPLATE_DIR = os.path.join(BASE_DIR,"templates")

# Quick-start development settings - unsuitable for production
# See https://docs.djangoproject.com/en/2.2/howto/deployment/checklist/

# SECURITY WARNING: keep the secret key used in production secret!
SECRET_KEY = 'd084cm20*x*&s&w)vq+7*teea540yny+fyi^dh57nxiff&a#25'

# SECURITY WARNING: don't run with debug turned on in production!
DEBUG = True

COMPRESS_ENABLED = False
COMPRESS_CSS_HASHING_METHOD = 'content'
COMPRESS_FILTERS = {
    'css':[
        'compressor.filters.css_default.CssAbsoluteFilter',
        'compressor.filters.cssmin.rCSSMinFilter',
    ],
    'js':[
        'compressor.filters.jsmin.JSMinFilter',
    ]
}
HTML_MINIFY = False
KEEP_COMMENTS_ON_MINIFYING = False

ALLOWED_HOSTS = ['0.0.0.0', "127.0.0.1"]


# Application definition

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'channels',
    'bootstrap3',
    'tourMama_app',
    'account',
    'posts',
    'group',
    'news',
    'statistics',
    'compressor',
]

AUTH_USER_MODEL = "account.UserProfile"

MIDDLEWARE = [
    'django.middleware.gzip.GZipMiddleware',
    'htmlmin.middleware.HtmlMinifyMiddleware',
    'htmlmin.middleware.MarkRequestMiddleware',

    'django.middleware.security.SecurityMiddleware',
    'django.contrib.sessions.middleware.SessionMiddleware',
    'django.middleware.common.CommonMiddleware',
    'django.middleware.csrf.CsrfViewMiddleware',
    'django.contrib.auth.middleware.AuthenticationMiddleware',
    'django.contrib.messages.middleware.MessageMiddleware',
    'django.middleware.clickjacking.XFrameOptionsMiddleware',
]

ROOT_URLCONF = 'tourMama.urls'

TEMPLATES = [
    {
        'BACKEND': 'django.template.backends.django.DjangoTemplates',
        'DIRS': [TEMPLATE_DIR,],
        'APP_DIRS': True,
        'OPTIONS': {
            'context_processors': [
                'django.template.context_processors.debug',
                'django.template.context_processors.request',
                'django.contrib.auth.context_processors.auth',
                'django.contrib.messages.context_processors.messages'   ,
            ],
        },
    },
]

WSGI_APPLICATION = 'tourMama.wsgi.application'
ASGI_APPLICATION = 'tourMama.routing.application'

# https://stackoverflow.com/questions/56480472/cannot-connect-to-redis-container-from-app-container/56480746#56480746
CHANNEL_LAYERS = {
    'default': {
        'BACKEND': 'channels_redis.core.RedisChannelLayer',
        'CONFIG': {
            "hosts": [('redis', 6379)],
        },
    },
}


# Database
# https://docs.djangoproject.com/en/2.2/ref/settings/#databases

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql',
        'HOST': os.environ.get('DB_HOST'),
        'NAME': os.environ.get('DB_NAME'),
        'USER': os.environ.get('DB_USER'),
        'PASSWORD': os.environ.get('DB_PASS')
    }
}


# Password validation
# https://docs.djangoproject.com/en/2.2/ref/settings/#auth-password-validators

AUTH_PASSWORD_VALIDATORS = [
    {
        'NAME': 'django.contrib.auth.password_validation.UserAttributeSimilarityValidator',
    },
    {
        'NAME': 'django.contrib.auth.password_validation.MinimumLengthValidator',
    },
    {
        'NAME': 'django.contrib.auth.password_validation.CommonPasswordValidator',
    },
    {
        'NAME': 'django.contrib.auth.password_validation.NumericPasswordValidator',
    },
]

STATICFILES_FINDERS = (
    'django.contrib.staticfiles.finders.FileSystemFinder',
    'django.contrib.staticfiles.finders.AppDirectoriesFinder',
    # other finders..
    'compressor.finders.CompressorFinder',
)

CACHES = {
    'default': {
        'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
        'LOCATION': '127.0.0.1:11211',
    }
}

# Internationalization
# https://docs.djangoproject.com/en/2.2/topics/i18n/

LANGUAGE_CODE = 'en-us'

TIME_ZONE = 'UTC'

USE_I18N = True

USE_L10N = True

USE_TZ = True


# Static files (CSS, JavaScript, Images)
# https://docs.djangoproject.com/en/2.2/howto/static-files/


STATIC_URL = '/static/'
STATICFILES_DIRS = [os.path.join(BASE_DIR, 'static'),]
STATIC_ROOT = os.path.join(BASE_DIR,"static_root")

MEDIA_URL = '/media/'
MEDIA_ROOT = os.path.join(BASE_DIR, 'media')

LOGIN_REDIRECT_URL = "home:index"
LOGOUT_REDIRECT_URL = "home:index"

Solution

  • environment:
      - DB_HOST=db
      - DB_NAME=app
      - DB_USER=postgres
      - DB_PASS=supersecretpassword
    

    I see that you are passing the environment variables via docker-compose like this. This is fine when the container is running the command directly inside the shell.

    However, when putting it inside the crontab, the cronjob will run your command in a separate fresh shell with no environment being passed in at all.

    To work around this problem, you can create a separate shell script:

    cat <<EOF > /temp/script.sh
    #!/bin/bash
    export DB_HOST=db
    export DB_NAME=app
    export DB_USER=postgres
    export DB_PASS=supersecretpassword
    
    cd /tourMamaRoot/tourMama/cronjob && fetch_news.py >> /var/log/cron.log 2>&1
    EOF
    
    chmod +x /temp/script.sh
    

    and edit your crontab like this:

    * * * * * /temp/script.sh