Search code examples
postgresqlstreamingdata-access-layernotifylisten

importing updated files into a database


I have files that are updated every 2 hours. I have to detect the files automatically and insert the extracted information from them into a database.

Our DBMS is Postgresql and programming language is Python. How would you suggest I do that?

I want to make use of DAL (Database Abstraction Layer) to make connection between the files and database and use postgresql LISTEN/NOTIFY techniques to detect the new files. If you agree with me please tell me how I can use LISTEN/NOTIFY functions to detect the files.

Thank you


Solution

  • What you need is to write a script that stays running as a dæmon, using a file system notify API to run a callback function when the files change. When the script is notified that the files change it should connect to PostgreSQL and do the required work, then go back to sleep waiting for the next change.

    The only truly cross platform way to watch a directory for changes is to use a delay loop to poll os.listdir and os.stat to check for new files and updated modification times. This is a waste of power and disk I/O; it also gets slow for big sets of files. If your OS reliably changes the directory modification time whenever files within the directory change you can just os.stat the directory in a delay-loop, which helps.

    It's much better to use an operating system specific notification API. Were you using Java I'd tell you to use the NIO2 watch service, which handles all the platform specifics for you. It looks like Watchdog may offer something similar for Python, but I haven't needed to do directory change notification in my Python coding so I haven't tested it. If it doesn't work out you can use platform-specific techniques like inotify/dnotify for Linux, and the various watcher APIs for Windows.

    See also: