Search code examples
pythondjangoalgorithmsortingreddit

How to recreate reddit sorting in django queryset?


As a learning project, I thought it’d be fun to create a kind of reddit clone in Django. This is where I‘m at right now. My next step is to implement the reddit sorting. I found the algorithm here. If it helps, this is the sorting in sql.

I was able to set it up as a property in the model, but that does not get saved to the database, which means you cannot sort on it. So in order to sort on it, I’ve been trying to recreate the sorting algorithm in a query string, using annotate() and aggregate() functions. However, I haven’t been able to get very far.

So my question is: how would one go about recreating the reddit sort algorithm in Django?

This is (part of) my views.py, which doesn't work and is missing a lot of stuff after line 33, though the annotate function for getting the score (upvotes - downvotes) and sorting on it, did work:

import pytz
from pytz import timezone
from datetime import datetime, timedelta
# from math import log

from django.shortcuts import render, get_object_or_404, redirect
from django.core.urlresolvers import reverse
from django.core.paginator import Paginator, EmptyPage, PageNotAnInteger
from django.http import HttpResponseRedirect
from django.contrib.auth.decorators import login_required
from django.contrib.contenttypes.models import ContentType
from django.db.models import F

from .models import Link, Comment
from .forms import LinkForm, CommentForm

# the reddit sorting algorithm
# epoch = datetime(1970, 1, 1, tzinfo=pytz.utc)
#
# def get_epoch_seconds(date):
#     """Returns the number of seconds from the epoch to date."""
#     td = date - epoch
#     return td.days * 86400 + td.seconds + (float(td.microseconds) * 1000000)
#
# def reddit_sort(score, date):
#     order = log(max(abs(score), 1), 10)
#     sign = 1 if score > 0 else -1 if score < 0 else 0
#     seconds = get_epoch_seconds(date) - 1134028003
#     return round(sign * order + seconds / 45000, 7)

def links(request):
    all_links = Link.objects.annotate(
        total_score = F('upvotes') - F('downvotes'),
    ).aggregate(
        Case(When(score__lt=0, then=Value(-1)),
            When(score__gt=0, then=Value(1)),
            default=0
        ) * Func('LOG', (Max(Abs(total_score), 1) + (EPOCH(F('date')) - 1134028003) / 4500)
    ).order_by('total_score')

    paginator = Paginator(all_links, 10) #show 10 links per page
    page = request.GET.get('page')

    if request.method == 'POST':
        if request.user.is_authenticated():
            form = LinkForm(data=request.POST,auto_id=True)
            if form.is_valid():
                form.full_clean()
                form.save()
                return HttpResponseRedirect('/')
        else:
            return HttpResponseRedirect('/accounts/login/?next=/')
    else:
        form = LinkForm(auto_id=True)
        try:
            links = paginator.page(page)
        except PageNotAnInteger:
            # in case of invalid page, serve homepage, should probably add error message
            links = paginator.page(1)
        except EmtyPage:
            # If page is out of range (e.g. 9999), deliver last page of results.
            links = paginator.page(paginator.num_pages)
    return render(request, 'posts/links.html', {'links': links, 'form': form})

Solution

  • Are you trying to implement the "hot" sort (there are several other reddit-specific sorts)?

    reddit stores sort values for links (and comments) in a cache of sorts; the vote processor then calculate the new sort value and stores that. This ends up being a lot easier than trying to do on-the-fly sort calculations in the query, like you're doing. It's also generally more efficient, since you'll only have one calculation for each vote, whereas your approach will recalculate every post's sort value on each read.