I am having a problem scheduling a cron job which requires scraping a website and storing it as part of the model (MOVIE) in the database.
The problem is that the model seems to get loaded before Procfile is executed.
How should I create a cron job which runs internally in the background and storing scraped information into the database? Here are my codes:
web: python manage.py runserver$PORT
scheduler: python cinemas/scheduler.py
# More code above
from cinemas.models import Movie
from apscheduler.schedulers.blocking import BlockingScheduler
sched = BlockingScheduler()
@sched.scheduled_job('cron', day_of_week='mon-fri', hour=0, minutes=26)
def get_movies_playing_now():
global url_movies_playing_now
title = []
description = []
#Create BeatifulSoup Object with url link
s = requests.get(url_movies_playing_now, headers=headers)
soup = bs4.BeautifulSoup(s.text, "html.parser")
movies = soup.find_all('ul', class_='w462')[0]
#Find Movie's title
for movie_title in movies.find_all('h3'):
#Find Movie's description
for movie_description in soup.find_all('ul',
description.append(movie_description.text.replace(" [More]","."))
for t, d in zip(title, description):
m = Movie(movie_title=t, movie_description=d)
#Go to the next page to find more movies
paging = soup.find( class_='pagenating').find_all('a', class_=lambda x:
x != "inactive")
href = ""
for p in paging:
if "next" in p.text.lower():
href = p['href']
url_movies_playing_now = href
# More code below
from django.db import models
#Create your models here.
class Movie(models.Model):
movie_title = models.CharField(max_length=200)
movie_description = models.CharField(max_length=20200)
This is the error i am getting when the Job is ran.
2016-11-17T17:57:06.074914+00:00 app[scheduler.1]: Traceback (most recent call last): 2016-11-17T17:57:06.074931+00:00 app[scheduler.1]: File "cinemas/scheduler.py", line 2, in 2016-11-17T17:57:06.075058+00:00 app[scheduler.1]: import cineplex 2016-11-17T17:57:06.075060+00:00 app[scheduler.1]: File "/app/cinemas/cineplex.py", line 1, in 2016-11-17T17:57:06.075173+00:00 app[scheduler.1]: from cinemas.models import Movie 2016-11-17T17:57:06.075196+00:00 app[scheduler.1]: File "/app/cinemas/models.py", line 5, in 2016-11-17T17:57:06.075295+00:00 app[scheduler.1]: class Movie(models.Model): 2016-11-17T17:57:06.075297+00:00 app[scheduler.1]: File "/app/.heroku/python/lib/python3.5/site-packages/django/db/models/base.py", line 105, in new 2016-11-17T17:57:06.075414+00:00 app[scheduler.1]: app_config = apps.get_containing_app_config(module) 2016-11-17T17:57:06.075440+00:00 app[scheduler.1]: File "/app/.heroku/python/lib/python3.5/site-packages/django/apps/registry.py", line 237, in get_containing_app_config 2016-11-17T17:57:06.075585+00:00 app[scheduler.1]:
self.check_apps_ready() 2016-11-17T17:57:06.075586+00:00 app[scheduler.1]: File "/app/.heroku/python/lib/python3.5/site-packages/django/apps/registry.py", line 124, in check_apps_ready 2016-11-17T17:57:06.075703+00:00 app[scheduler.1]: raise AppRegistryNotReady("Apps aren't loaded yet.") 2016-11-17T17:57:06.075726+00:00 app[scheduler.1]: django.core.exceptions.AppRegistryNotReady: Apps aren't loaded yet.
Cron job works fine if I do not include Model objects. How should I run this job every day using Model objects without failing?
That's because you can't just import the Django packages, models, etc.
In order to work properly, the Django internals require initialization, that's triggered from manage.py
Rather than try and re-create all that myself, I always write long-running, non-web commands as a custom management command.
For example, if your app is cinemas
, you would:
(that sub-class must be called Command
. In your case, that's where you'd call sched.start()
would then have scheduler: python manage.py scheduler
Hope that helps.