Search code examples
djangodjango-orm

Updating a django table, hashing a specific field in a table


I have a table that looks something like

class PodUsage(models.Model):
    pod_id = models.CharField(max_length=256, db_index=True)
    pod_name = models.CharField(max_length=256)
    start_time = models.DateTimeField(blank=True, null=True, default=timezone.now)
    end_time = models.DateTimeField(blank=True, null=True)
    anonymised = models.BooleanField(default=False)
    user = models.ForeignKey("accounts.ServiceUser", null=True, blank=True, on_delete=models.CASCADE)

As part of our GDPR requirement, we need to anonymize data after a certain period, which I could absolutely do as a loop:

  count = 0
  records = PodUsage.objects.filter(
            anonymised=False,
            start_time__lte=timezone.now() - timedelta(weeks=settings.DATA_ANONYMISING_PERIOD_WEEKS)
            )
  for record in records:
        record.pod_name = hashlib.sha256(record.pod_name.encode('utf-8')).hexdigest()
        record.user = None
        record.anonymised = True
        record.save()
        count += 1
  # Log count somewhere

however I think I should be able to do it with an update function:

  count = PodUsage.objects.filter(
        anonymised=False,
        start_time__lte=timezone.now() - timedelta(weeks=settings.DATA_ANONYMISING_PERIOD_WEEKS)
        ).update(
            pod_name = hashlib.sha256(pod_name.encode('utf-8')).hexdigest(),
            user = None,
            anonymised = True
        )
  # Log count somewhere

..... but I can't figure out the correct incantation to reference the field in the update portion

  • as given, pod_name is not defined
  • sha256("pod_name".encode('utf-8')) obviously just encodes the string "pod_name"
  • sha256(F("pod_name").encode('utf-8')) breaks the code with 'F' object has no attribute 'encode'

Any suggestions?


Solution

  • You can use the SHA256 function [Django-doc] to let the database hash:

    from django.db.models.functions import SHA256
    
    count = PodUsage.objects.filter(
        anonymised=False,
        start_time__lte=timezone.now()
        - timedelta(weeks=settings.DATA_ANONYMISING_PERIOD_WEEKS),
    ).update(pod_name=SHA256('pod_name'), user=None, anonymised=True)