Search code examples
pythoncsveventspython-watchdog

How to monitor a CSV file for changes?


I'm trying to monitor a CSV file that is being written to by a separate program. Around every 10 seconds, the CSV file is updated with a couple more lines. Each time the file is updated, I want to be able to detect the file has been changed (will always be the same file), take the new lines, and write them to console (just for a test).

I have looked around the website, and have found numerous ways of watching a file to see if its updated (like so http://thepythoncorner.com/dev/how-to-create-a-watchdog-in-python-to-look-for-filesystem-changes/), but I can't seem to find anything that will allow me to get to the changes made in the file to print out to console.

Current code:

import time
from watchdog.observers import Observer
from watchdog.events import PatternMatchingEventHandler

def on_created(event):
    print(f"hey, {event.src_path} has been created!")

def on_deleted(event):
    print(f"Someone deleted {event.src_path}!")

def on_modified(event):
    print(f"{event.src_path} has been modified")

def on_moved(event):
    print(f"ok ok ok, someone moved {event.src_path} to {event.dest_path}")

if __name__ == "__main__":
    patterns = "*"
    ignore_patterns = ""
    ignore_directories = False
    case_sensitive = True
    my_event_handler = PatternMatchingEventHandler(patterns, ignore_patterns, ignore_directories, case_sensitive)
    my_event_handler.on_created = on_created
    my_event_handler.on_deleted = on_deleted
    my_event_handler.on_modified = on_modified
    my_event_handler.on_moved = on_moved
    path = "."
    go_recursively = True
    my_observer = Observer()
    my_observer.schedule(my_event_handler, path, recursive=go_recursively)
    my_observer.start()
    try:
        while True:
            time.sleep(1)
    except KeyboardInterrupt:
        my_observer.stop()
        my_observer.join()

This runs, but looks for changes in files all over the place. How do I make it listen for changes from one single file?


Solution

  • If you're more or less happy with the script other than it tracking a bunch of files then you could change the patterns = "*" part which is a wildcard matching string which tells the PatternMatchingEventHandler to look for any file. You could change that to paterns = 'my_file.csv' and also change the path variable to the directory that the file is in to save some time recursively scanning all the directories in '.'. Then you don't need recursive set to True for a single file either.

    Print new lines to console part (one option):

    import pandas as pd
    
    ...
    
    def on_modified(event):
        print(f"{event.src_path} has been modified")
        # You said "a couple more lines" I'm going to take that
        # as two:
        df = pd.read_csv(event.src_path)
        print("Newest 2 lines:")
        print(df[-2:])
    

    If it's not two lines you'll want to track the length of the file and pass that to the function which opens the CSV so it knows how many lines are new.