Search code examples
sqletldata-warehousedsl

Domain specific language to perform data extraction and transformation in ETL pipeline


Does anyone of domain specific languages (DSL) that facilitate data extraction and transformation as part of an Extract-Transform-Load (ETL) pipeline?

I'd like to extract data from a 3rd party SQL database and transform the data to an already defined JSON format to store it into my application. There are many different possible database schemata to extract data from, so I was wondering whether there is already a way to configure this through the help of a (commonly used) extraction language (ideally that language is also agnostic to other data sources such as web services, etc).

I had a look around, but other than a few research papers I couldn't find much in terms of agreed standards for ETL (minus the 'L' which I've got covered) and I don't want to reinvent the wheel.

I'd appreciate any pointers in the right direction.


Solution

  • Creating a good, all-encompassing DSL for ETL is I believe not just hard, it's a bit of a fool's errand. To handle the many real-world ETL complexities, you end up re-creating a general-purpose language.

    And ETL "without programming skill" as this research paper attempts will struggle with the messiness of cleaning and conforming disparate source systems.

    Using a general-purpose language by itself is of course possible but very time consuming due to the low abstraction level, and all the infrastructure code you'd have to implement.

    Graphical ETL tools and some ETL DSLs address this by adding scripts or calling out to external programs. While useful and essential, this does have the disadvantage of employing multiple different programming models, with associated mental and technical friction when moving between them.

    A different and I believe a better approach is to instead add ETL capabilities to a general-purpose language. Done well, you combine the benefits of ETL specific functionality and a high abstraction level with the power of a general-purpose language and its large eco-system, all delivered via a single programming model.

    As one example of this latter approach, my company provides actionETL, a cross-platform .NET ETL library that combines an ETL mindset with the advantages of modern application development. For example, it provides familiar control flow and dataflow ETL capabilities, and uses internal DSLs in several places to simplify configuration. Do try it out if it sounds like a good fit.

    actionETL now also has a free Community edition.

    Cheers, Kristian