Search code examples
pythondjangosqlitedjango-viewsrss

How to update my Django database with rss feed every X minutes?


Am working new with RSS feed

For every x minutes, i want to add things to my database from rss feed if it has any new things in that. I have written the code to fetch and update in database but how to make that code run for every X minutes. If i put the piece of code inside one of my views function which renders home page, it slows down the page loading speed. I want it to happen automatically every x minutes without affecting my website functionality.

VIEWS.PY

from django.shortcuts import render
from .models import Article,Slide
import feedparser

rss = feedparser.parse('url am passing')
already_updated = False

first_entry = rss.entries[0]
for slide in Slide.objects.all():
    if first_entry.title == slide.title:
        already_updated = True

if not already_updated:
    for entry in rss.entries:
        new = Slide(title = entry.title, article_name = Article.objects.last())
        new.save()
        print(entry['title'])


def test(request):
    articles = Article.objects.all()
    slides = Slide.objects.all()
    return render(request, 'sample/test_amp.html', {'articles':articles, 'slides':slides})

Solution

  • A simple approach is to use APScheduler library. Once installed, you need to call the scheduler from the app's config file (apps.py) to start when manage.py runserver command is run. Once the APScheduler process has started this way, it will run every interval that you have defined. Here is a working example assuming you have an app called Home.

    Directory structure:

    Basedir
    | - ProjectName
    | - Home
    | - - __init__.py
    | - - admin.py
    | - - apps.py
    | - - models.py
    | - - test.py
    | - - views.py
    | - - jobs.py
    | - - BackgroundClass.py
    

    In your BackgroundClass.py, you will define a function that is going to be doing the processing part where you get the RSS feed and update the DB using the results.

    Home/BackgroundClass.py

    class BackgroundClass:
    
        @staticmethod
        def update_db():
            # Do your update db from RSS task here
    

    Now in your jobs.py, you will define a function/class that will create an instance of BackgroundScheduler from APScheduler, that keeps running in the background indefinitely every X intervals that you define. Using this, you will call your update_db function from the BackgroundClass.

    Home/jobs.py

    from apscheduler.schedulers.background import BackgroundScheduler
    from .BackgroundClass import BackgroundClass
    
    
    def start():
        scheduler = BackgroundScheduler()
        scheduler.add_job(BackgroundClass.update_db, 'interval', minutes=1)
        scheduler.start()
    

    Now in the apps.py, you are going to call that function/class defined in jobs.py to run when manage.py runserver command is called, so your background task starts with the server, and keeps running as long as the server is running; executing every X intervals.

    Home/apps.py

    from django.apps import AppConfig
    
    
    class HomeConfig(AppConfig):
        name = 'Home'
    
        def ready(self):
            import os
            from . import jobs
    
            # RUN_MAIN check to avoid running the code twice since manage.py runserver runs 'ready' twice on startup
            if os.environ.get('RUN_MAIN', None) != 'true':
                jobs.start()