Search code examples
herokupython-3.6apschedulerdjango-2.1

Unable to update GeoJSON files in an application using APScheduler on Heroku


I have 2 GeoJSON files in my application. I have written a Python job using APScheduler to update the 2 GeoJSON files based on the changes in the database. The job is configured to run once every 24 hours. Currently, I get the confirmation message that the new GeoJSON file was created but it crashes immediately after printing this log statement. I am not sure if we can write into the Heroku container, is that the reason for the job to crash?

What alternatives do I have to make it work? One of the things that I would be trying is to write the output of APScheduler to Amazon S3. Any suggestions in this regards would be of great help.

I have another job that is to update a couple of fields in the DB which works fine.

Also, this works fine locally. It replaces the existing GeoJSON based on the changes in the database.

from apscheduler.schedulers.blocking import BlockingScheduler
from apscheduler.schedulers.background import BackgroundScheduler

import psycopg2
from UnoCPI import sqlfiles
import os

import Project_GEOJSON,Partner_GEOJSON

sched = BlockingScheduler()
sched1 = BackgroundScheduler()
# Initializing the sql files
sql = sqlfiles

# Schedules job_function to be run on the third Friday
# of June, July, August, November and December at 00:00, 01:00, 02:00 and 03:00
# sched.add_job(YOURRUNCTIONNAME, 'cron', month='6-8,11-12', day='3rd fri', hour='0-3')


@sched.scheduled_job('cron', day_of_week='mon-sun', hour=23)
# @sched.scheduled_job('cron', month='1,6,8', day='1', hour='0')
# @sched.scheduled_job('interval', minutes=5)
@sched1.add_job(generateGEOJSON,'cron', day_of_week='mon-sun', hour=20)


def generateGEOJSON():
    os.system(Partner_GEOJSON)
    os.system(Project_GEOJSON)

def scheduled_job():
    print('This job is ran every day at 11pm.')
    # print('This job is ran every 1st day of the month of January, June and August at 12 AM.')
    # print('This job is ran every minute.')

    global connection
    global cursor

    try:
        # CAT STAGING
        connection = psycopg2.connect(user="heroku cred",
                                      password="postgres password from heroku",
                                      host="heroku host",
                                      port="5432",
                                      database="heroku db",
                                      sslmode="require")



        if connection:
            print("Postgres SQL Database successful connection")

        cursor = connection.cursor()

        # create a temp table with all projects start and end dates
        cursor.execute(sql.start_and_end_dates_temp_table_sql)

        # fetch all community partners to be set to inactive
        cursor.execute(sql.comm_partners_to_be_set_to_inactive)

        inactive_comm_partners = cursor.fetchall()
        print("Here is the list of all projects to be set to inactive", "\n")
        # loop to print all the data
        for i in inactive_comm_partners:
            print(i)

        # fetch all community partners to be set to active
        cursor.execute(sql.comm_partners_to_be_set_to_active)

        active_comm_partners = cursor.fetchall()
        print("Here is the list of all projects to be set to active", "\n")
        # loop to print all the data
        for i in active_comm_partners:
            print(i)

        # UPDATE PROJECT STATUS TO ACTIVE
        cursor.execute(sql.update_project_to_active_sql)

        # UPDATE PROJECT STATUS TO COMPLETED
        cursor.execute(sql.update_project_to_inactive_sql)

        # UPDATE COMMUNITY PARTNER WHEN TIED TO A INACTIVE PROJECTS ONLY TO FALSE(INACTIVE)
        cursor.execute(sql.update_comm_partner_to_inactive_sql)

        # UPDATE  COMMUNITY PARTNER WHEN TIED TO A BOTH ACTIVE
        # and / or INACTIVE or JUST ACTIVE PROJECTS ONLY TO TRUE(ACTIVE)
        cursor.execute(sql.update_comm_partner_to_active_sql)

        # drop all_projects_start_and_end_date temp table
        cursor.execute(sql.drop_temp_table_all_projects_start_and_end_dates_sql)

    except (Exception, psycopg2.Error) as error:
        print("Error while connecting to Postgres SQL", error)
    finally:
        # closing database connection.
        if connection:
            connection.commit()
            cursor.close()
            connection.close()
            print("Postgres SQL connection is closed")


sched.start()
sched1.start()

Solution

  • I am not sure if we can write into the Heroku container

    You can, but your changes will be periodically lost. Heroku's filesystem is dyno-local and ephemeral. Every time your dyno restarts, changes made to the filesystem will be lost. This happens frequently (at least once per day) and unpredictably.

    One of the things that I would be trying is to write the output of APScheduler to Amazon S3

    That's exactly what Heroku recommends doing with generated files and user uploads:

    AWS Simple Storage Service, e.g. S3, is a “highly durable and available store” and can be used to reliably store application content such as media files, static assets and user uploads. It allows you to offload your entire storage infrastructure and offers better scalability, reliability, and speed than just storing files on the filesystem.

    AWS S3, or similar storage services, are important when architecting applications for scale and are a perfect complement to Heroku's ephemeral filesystem.