Search code examples
pythondjangomultithreadingherokupython-multithreading

Trying to run task in a separate thread on Heroku, but no new thread seems to open


I have a Django app with a view in the admin that allows a staff user to upload a csv, which then gets passed to a script which builds and updates items in the database from the data. The view runs the script in a new thread and then returns an "Upload started" success message.

apps/products/admin.py

from threading import Thread
# ...
from apps.products.scripts import update_products_from_csv

@admin.register(Product)
class ProductAdmin(admin.ModelAdmin):
    # normal ModelAdmin stuff

    def upload_csv(self, request):
        if request.method == 'POST':
            csv_file = request.FILES['csv_file']
            t = Thread(target=update_products_from_csv.run, args=[csv_file])
            t.start()
            messages.success(request, 'Upload started')
            return HttpResponseRedirect(reverse('admin:products_product_changelist'))

apps/products/scripts/update_products_from_csv.py

import csv
import threading
from time import time
# ...

def run(upload_file):
    # print statements just here for debugging

    print('Update script running', threading.currentThread())

    start_time = time()
    print(start_time)

    decoded_file = upload_file.read().decode('utf-8').splitlines()
    csv_data = [d for d in csv.DictReader(decoded_file)]
    print(len(csv_data))

    for i, row in enumerate(csv_data):
        if i % 500 == 0:
            print(i, time() - start_time)
        # code that checks if item needs to be created or updated and logs accordingly

    print('Finished', time() - start_time)

In development this works fine. The "Upload started" message appears almost immediately in the browser, and in the console it prints that it started on Thread-3 or Thread-5 or whatever, and then all the other print statements execute. When it's done I can query the EntryLog model and confirm that it made its changes.

When I push this up to Heroku, I still get the "Upload started" message immediately in the browser, but when I watch the logs it's printing Thread-1 instead of Thread-[any other number]. After that I see the start_time print statement execute, but after that the response starts and none of the other print statements run. After a while I query the EntryLog model, but no changes have been made.

From what I've read it sounds like I should be able to use threading on Heroku the same as I am locally, but it seems as though it's executing the script in the main thread and then just silently killing it when the response starts.


Solution

  • Turns out Heroku was actually opening a new thread just fine. It showing Thread-1 when I called print(threading.currentThread()) was a red herring. In my dev environment (Windows) the spawned thread always prints Thread-[number greater than 1], but further tests with simple threads that I was sure were executing to completion always printed Thread-1 in the Heroku environment. Difference in how the library functions on Windows vs Linux maybe? (This was my first time using the threading library, so apologies if I'm missing something obvious.)

    The actual issue seemed to be where I was reading the csv file. Lots of print statements later I narrowed that down to the exact line where everything stopped running. I tried a simple test just reading a txt file in a new thread and got the same result. No errors thrown or anything, just nothing after that point ran. I moved the code that reads and decodes the file into the view in the main thread, and then just passed in the extracted data to the new thread, and then everything worked perfectly after that.