Search code examples
djangomodelfilterunique

Query all rows and return most recent of each duplicate


I have a model that has an id that isn't unique. Each model also has a date. I would like to return all results but only the most recent of each row that shares ids. The model looks something like this:

class MyModel(models.Model):
    my_id = models.PositiveIntegerField()
    date  = models.DateTimeField()
    title = models.CharField(max_length=36)


## Add some entries
m1 = MyModel(my_id=1, date=yesterday, title='stop')
m1.save()

m2 = MyModel(my_id=1, date=today, title='go')
m2.save()

m3 = MyModel(my_id=2, date=today, title='hello')
m3.save()

Now try to retrieve these results:

MyModel.objects.all()... # then limit duplicate my_id's by most recent

Results should be only m2 and m3


Solution

  • You won't be able to do this with just the ORM, you'll need to get all the records, and then discard the duplicates in Python.

    For example:

    objs = MyModel.objects.all().order_by("-date")
    seen = set()
    keep = []
    for o in objs:
        if o.id not in seen:
            keep.append(o)
            seen.add(o.id)
    

    Here's some custom SQL that can get what you want from the database:

    select * from mymodel where (id, date) in (select id, max(date) from mymodel group by id)
    

    You should be able to adapt this to use in the ORM.