Search code examples
djangomodels

Django Models - Auto-Refresh field value


In my django models.py i crawl item's prices from amazon using lxml. When i hit save in the admin page, it store this price in a "price" field, but sometimes amazon prices changes, so i would like to update the price automatically every 2 days. This is my function for now:

class AmazonItem(models.Model):
    amazon_url = models.CharField(max_length=800, null=True, blank=True)
    price = models.DecimalField(max_digits=6, decimal_places=2, editable=False)
    last_update = models.DateTimeField(editable=False)

def save(self):
    if not self.id:
        if self.amazon_url:
            url = self.amazon_url
            source_code = requests.get(url)
            code = html.fromstring(source_code.text)
            prices = code.xpath('//span[@id="priceblock_ourprice"]/text()')
            eur = prezzi[0].replace("EUR ", "")
            nospace = eur.replace(" ", "")
            nodown = nospace.replace("\n", "")
            end = nodown.replace(",", ".")
            self.price = float(end)
        else:
            self.price = 0

    self.last_update = datetime.datetime.today()
    super(AmazonItem, self).save()

i really have no idea about how to do this, i only would like it to be done automatically


Solution

  • Isolate the sync functionality

    First I'd isolate the sync functionality out of the save, e.g. you can create a AmazonItem.sync() method

    def sync(self):
        # Your HTTP request and HTML parsing here
        # Update self.price, self.last_updated etc
    

    Cron job with management command

    So now, your starting point will be to call .sync() on the model instances that you want to sync. A very crude* way is to:

    for amazon_item in AmazonItem.objects.all():
        amazon_item.sync()
        amazon_item.save()
    

    You can e.g. put that code inside a Django Command called sync_amazon_items, and setup a cron job to run it each 2 days

    # app/management/commands/sync_amazon_items.py
    class Command(BaseCommand):
        def handle(self, *args, **options):
            for amazon_item in AmazonItem.objects.all():
                amazon_item.sync()
                amazon_item.save()
    

    Then you can make your OS or job scheduler run it, e.g. using python manage.py sync_amazon_items

    * This will be very slow as it goes sequentially through your list, also an error in any item will stop the operation, so you'll want to catch exceptions log them and keep going e.g.

    Use a task queue / scheduler

    A more performing and reliable (isolated failures) way is to queue up sync jobs (e.g. a job for each amazon_item, or a batch of N amazon_items) into a job queue like Celery, then setup Celery concurrence to run a few sync jobs currently

    To schedule periodic task with Celery have a look at Periodic Tasks