Search code examples
javagoogle-app-enginetransactionsgoogle-cloud-datastoredistributed-transactions

Find the flaws! Performing a long task reliably with the task queue


I'm making a gradebook on google app engine. I keep track of each student's grade per grading period. The grading periods can overlap. Since I may display hundreds of these grades at a time, I precalculate the grades on the server. So, for any one student, I may have many calculated grades - one for each grading period.

Now, the teacher enters a new score from a quiz. That score may affect many of the calculated grades, because it may fall into many grading periods. I need to recalculate all of the affected grades. This could take a long time, since for each grading period I need to fetch all relevant scores and do a complex routine over those scores. I think 30 seconds isn't enough - especially if the datastore is feeling slow today. Furthermore, failure is not an option. It is unacceptable for some grades to update and others to fall silently out of date.

So I think to myself, what a wonderful time to learn about the task queue!

I'm not an expert in DB structure or anything, but here's an outline of what I want to do:

public ReturnCode addNewScore(Float score, Date date, Long studentId)
{
    List<CalculatedGrade> existingGrades = getAllRelevantGradesForStudent(studentId, date);

    for (CalculatedGrade grade : existingGrades)
    {
        grade.markDirty(); //leaves a record that this grade is no longer up to date
    }

    persistenceManager.makePersistentAll(existingGrades);
    //DANGER ZONE?
    persistenceManager.makePersistent(new IndividualScore(score, date, studentId));

    tellTheTaskQueueToStartCalculating();

    return OMG_IT_WORKED;
}

This seems like a fast way to mark all of the relevant grades dirty. If it fails half-way through, then failure is returned and the client will know to try again. If a client later tries to fetch a dirty grade, we can return an error there.

Then, the task queue code would look something like this:

public void calculateThemGrades()
{
    List<CalculatedGrade> dirtyGrades = getAllDirtyGrades();

    try
    {
        for (CalculatedGrade grade : dirtyGrades)
        {
            List<Score> relevantScores = getAllRelevantScores();
            Float cleanGrade = calculateGrade(relevantScores);
            grade.setGrade(cleanGrade);
            grade.markClean();

            persistenceManager.flush();
        }
    }
    catch(Throwable anything)
    {
        //if there was any problem, like we ran out of time or the datastore is down or whatever, just try again
        tellTheTaskQueueToStartCalculating()
    }
}

Here's my question: does this guarantee that there will never be a calculated grade that is marked clean after a new score has been added?

Specific areas of concern:

  • will the existingGrades always be persisted before the new IndividualScore in the first snippet, around the danger zone?
  • Is it possible that another thread will start the task queue code in the danger zone so that those existingGrades might be marked clean again before the new IndividualScore is really entered? If so, how can I make sure that won't happen (transactions across all of the grades are out)?
  • Is persistenceManager.flush() enough to save partially-done calculations, even though the pm is not closed?

This must be a common sort of problem. I'd appreciate any links to tutorials, especially those for appengine. Thanks for reading so much!


Solution

  • If you're worried about race conditions, don't use a boolean dirty flag - instead, use a pair of timestamps. When you want to mark a record dirty, update the 'dirty' timestamp.

    When you start calculating the grade, make a note of what the 'dirty' timestamp was.

    When you finish calculating the grade, update the 'clean' timestamp to be equal to the value of the 'dirty' timestamp you read when you began, signifying that you've synchronized that grade with the new data as of that timestamp.

    Any record with a 'dirty' timestamp greater than its 'clean' timestamp is dirty. Any record where the two match is clean. Simple and effective. If another request adds new data that would affect a given grade while your taskqueue task is already in the middle of calculating the grade, the 'dirty' timestamp won't match the updated 'clean' timestamp, and thus the taskqueue will consider the record still dirty and process it again.