Search code examples
pythonfunctionpandasapscheduler

Process CSV in the background hourly with Pandas and APScheduler


I have a CSV File (ZN_15M) that I'm trying to use read_csv function on hourly. So I have APScheduler installed and am trying to use it to read the CSV file every hour (and some other stuff not shown but if I can get the read_csv stuff going the other stuff will work too):

import sys
from time import sleep
from apscheduler.schedulers.background import BackgroundScheduler


scheduler = BackgroundScheduler()
scheduler.start() 

def Run():
    f2 = open('C:\Users\cost9\OneDrive\Documents\PYTHON\Exported_Data\ZN_ES\ZN_15M.csv')
    ZN = pd.read_csv(f2)
    #Do stuff to the CSV File/DataFrame
    ZN.tocsv(path_or_buf = 'path')

def main():
    job = scheduler.add_interval_job(Run, minutes=60, args=())
    while True:
        sleep(60)
        sys.stdout.write('.'); sys.stdout.flush()

I don't get any errors when I manually run the script, but nothing is running hourly like I'd like. Not sure what I'm doing wrong here...

Update: I'm getting an error below:

def process_csv(path_to_csv):
    ZN_ES_comb = pd.read_csv(path_to_csv)
    # Insert your CSV processing here
    ZN_ES_comb = pd.DataFrame(ZN_ES_comb)
    ZN_ES_comb.to_csv(path_to_csv.replace('.csv', '_modified_{timestamp}.csv').format(
        timestamp=time.strftime("%Y%m%d-%H%M%S")), index=False)

if __name__ == '__main__':
    # Create CSV for demonstrating purposes
    path_to_csv = 'C:\Users\cost9\OneDrive\Documents\PYTHON\Daily Tasks\ZN_ES\ZN_ES_15M\CSV\ZN_ES_comb.csv'
    pd.DataFrame(ZN_ES_comb).to_csv(path_to_csv, index=False)
    # Start scheduler
    scheduler = BackgroundScheduler()
    scheduler.start()
    scheduler.add_job(func=process_csv,
                      args=[path_to_csv],
                      trigger=IntervalTrigger(seconds=2))
    # Wait for 7 seconds so that scheduler can call process_csv 3 times
    time.sleep(7)

The error is for the line pd.DataFrame(ZN_ES_comb).to_csv(path_to_csv, index=False) - it says:

NameError: name 'ZN_ES_comb' is not defined

Solution

  • There are two issues in your code:

    1. It should be ZN.to_csv() instead of ZN.tocsv() in def Run().
    2. Parameter value of time.sleep() is measured in seconds, not in minutes like you apparently have thought. Thus, during the sleeping Run() was not ran at all.

    In the following there is a working solution that works with Python 3.5 and APScheduler 3.3.1. IntervalTrigger() has also hours parameter which you might wanna use instead of seconds.

    import time
    
    import pandas as pd
    from apscheduler.schedulers.background import BackgroundScheduler
    from apscheduler.triggers.interval import IntervalTrigger
    
    
    def process_csv(path_to_csv):
        df = pd.read_csv(path_to_csv)
        # Insert your CSV processing here
        df.to_csv(path_to_csv.replace('.csv', '_modified_{timestamp}.csv').format(
            timestamp=time.strftime("%Y%m%d-%H%M%S")), index=False)
    
    if __name__ == '__main__':
        # Create CSV for demonstrating purposes
        path_to_csv = 'made_up.csv'
        pd.DataFrame({'fruit': ['apple', 'banana'],
                      'number': [1, 2]}).to_csv(path_to_csv, index=False)
        # Start scheduler
        scheduler = BackgroundScheduler()
        scheduler.start()
        scheduler.add_job(func=process_csv,
                          args=[path_to_csv],
                          trigger=IntervalTrigger(seconds=2))
        # Wait for 7 seconds so that scheduler can call process_csv 3 times
        time.sleep(7)