Search code examples
pythonfile-watcher

Look for specific file and execute stored procedure


One of my processes is writing data into text file and then a sql stored procedure stages that data in one of the sql table. As of now I am not sure about the timing of the file, so I need a file watcher that will look for that file and when that file will be available it will stage that data into sql table.

I have tried the below piece of code but I am not able to stop and execute sql stored procedure when I get that file. For ex: filename is Process1_Timestamp.txt.

I have created the below process:

  1. Created function to return files in a directory.

  2. Created a function to compare two list.

And then this:

def fileWatcher(my_dir: str, pollTime: int):
    while True:
        if 'SeeFiles' not in locals(): #Check if this is the first time the function has run
            previousFileList = fileInDirectory(watchDirectory)
            watching = 1
            print('First attempt')
            print(previousFileList)
        
        time.sleep(pollTime)
        
        newFileList = fileInDirectory(watchDirectory)
        
        fileDiff = listComparison(previousFileList, newFileList)
        
        previousFileList = newFileList
        if len(fileDiff) == 0: continue
        doThingsWithNewFiles(fileDiff)

How I can stop looking when I get that file and trigger the next sql process?


Solution

  • Have you looked at Watchdog https://pythonhosted.org/watchdog/

    Largely taken from the example on https://pythonhosted.org/watchdog/quickstart.html#a-simple-example and using FileSystemEventHandler https://pythonhosted.org/watchdog/api.html#watchdog.events.FileSystemEventHandler

    import sys
    import time
    
    from watchdog.events import FileSystemEventHandler
    from watchdog.observers import Observer
    
    
    class CustomHandler(FileSystemEventHandler):
        def on_created(self, event):
            print(f'File or directory name: {event.src_path}')
            # do stuff
    
    
    if __name__ == "__main__":
        path = sys.argv[1] if len(sys.argv) > 1 else '.' # use the current directory if one is not supplied as the first argument but you could provide the path in any way you like.
        event_handler = CustomHandler()
        observer = Observer()
        observer.schedule(event_handler, path, recursive=True)
        observer.start()
        try:
            while True:
                time.sleep(1)
        except KeyboardInterrupt:
            observer.stop()
        observer.join()
    

    In a more minimal form:

    import time
    from watchdog.events import FileSystemEventHandler
    from watchdog.observers import Observer
    
    
    class CustomHandler(FileSystemEventHandler):
        def on_created(self, event):
            print(f'File or directory name: {event.src_path}')
            # do stuff
    
    
    if __name__ == "__main__":
        observer = Observer()
        observer.schedule(CustomHandler(), '/path/to/my/directory', recursive=True)
        observer.start()
        try:
            while True:
                time.sleep(1)
        except KeyboardInterrupt:
            observer.stop()
        observer.join()
    

    FileSystemEventHandler has other methods apart from on_created() you can override if you don't just want your code called when a file or directory is created.

    If you are not interested in directories that have changed you can use event parameters e.g.

    class CustomHandler(FileSystemEventHandler):
        def on_created(self, event):
            print(f'Created file or directory name: {event.src_path}')
            # do stuff
        
        def on_modified(self, event):
            if not event.is_directory:
                print(f'Modified file name: {event.src_path}')
                # do other stuff
    

    If you want to stop after finding the first file:

    import time
    from watchdog.events import FileSystemEventHandler
    from watchdog.observers import Observer
    
    
    class CustomHandler(FileSystemEventHandler):
        def on_created(self, event):
            if not event.is_directory:
                print(f'File name: {event.src_path}')
                # do stuff
                observer.stop()
    
    
    if __name__ == "__main__":
        observer = Observer()
        observer.schedule(CustomHandler(), '.', recursive=True)
        observer.start()
        try:
            while observer.should_keep_running():
                time.sleep(1)
        except KeyboardInterrupt:
            observer.stop()
        observer.join()
    

    Make sure you read and understand https://pythonhosted.org/watchdog/installation.html#supported-platforms-and-caveats these caveats may apply to all solutions, not just Watchdog.