Search code examples
triggersdependenciesetldata-warehouse

Linux tool for triggering data updates and transformations (low-end ETL/data warehousing tool)


I have a bunch of scripts collecting data from internet and local services, writing them to disk, scripts transforming the data and writing it into a database, scripts reading data from the database and generating new data, etc, written in bash, Python, SQL, ... (Linux).

Apart from a few time-triggered scripts, the glue between the scripts is currently me, running the scripts now and then in a particular order to update everything.

What is the simplest way to replace me by a tool that observes dependencies and triggers the next step as soon as the preconditions are met?

I've found many ETL and data warehousing tools, but these seem too heavy weight for my simple setting. I'd prefer a CLI solution with text-based configuration (maybe able to visualise the graph of dependencies). Any suggestions?


Solution

  • Try airflow: airflow.apache.org