Search code examples
javaarchitectureetl

ETL architecture


I've been asked to make an ETL-style application that transfers information from one data source to another. At the moment, I've decided to use a three-layer architecture but I would like to find out more about the best practices as well as the life cycle described on this wikipedia page:

http://en.wikipedia.org/wiki/Extract,_transform,_load

Four-layered approach for ETL architecture design

  • Functional layer: Core functional ETL processing (extract, transform, and load).
  • Operational management layer: Job-stream definition and management, parameters, scheduling, monitoring, communication and alerting.
  • Audit, balance and control (ABC) layer: Job-execution statistics, balancing and controls, rejects- and error-handling, codes management.
  • Utility layer: Common components supporting all other layers.

Real-life ETL cycle

The typical real-life ETL cycle consists of the following execution steps:

  1. Cycle initiation
  2. Build reference data
  3. Extract (from sources)
  4. Validate
  5. Transform (clean, apply business rules, check for data integrity, create aggregates or disaggregates)
  6. Stage (load into staging tables, if used)
  7. Audit reports (for example, on compliance with business rules. Also, in case of failure, helps to diagnose/repair)
  8. Publish (to target tables)
  9. Archive
  10. Clean up

Solution

  • I don't know what your situation is or what your requirements are, but you're likely over thinking the problem.

    The name alone is "the" architecture:

    • Extract
    • Transform
    • Load

    Exporting a DB table to a CSV can be considered "ET" while loading the CSV is the "L". Most ETL problems are simply not complicated.

    Beyond that, you should grab any of the 1 or 2 million ETL and ESB packages already available in Java, free and commercial, libraries and full boat processing systems, and simply adopt one of them that you like best.

    Get a white board, string some bubbles together with lines and turn that in to code.