Search code examples
apache-camelmulepentahoesbdata-integration

What is the difference between data integration softwares and ESB?


I have been working on a project which collects data from various third party data sources and mines into our data stores (DI). We have been using Pentaho for this.

I want to know if this can also be done with ESB (Camel or Mule) ? And what other features does ESB brings which DI do not offers ?

I have read lots of articles on both ESB and DI but none of them were able to resolve this query. I have also read about mule data connectors for third party data sources.


Solution

  • DI (Data Integration not 'dependency-injection') or ETL approaches tend to be long running batch-style jobs to approach the solution of moving data from System A to System B. The ESB or lightweight integration approach is generally to break up the task into smaller pieces (blocks of data, or single event per data item) and allow for other systems to subscribe to the data stream-- generally over an Enterprise Messaging System-- without having to impact System A, System B or the existing code project. This also means that there is no human dependency requirement in the project plan. If System C comes along, they do not necessarily require resources from the System B team to access the data stream

    There are suitable use cases to have both in any given environment. However, in my experience (Big Data/MDM best practices tend to agree) is that if you have an originating stream of data, some other system will want to access the data stream at some point as well. If the ability to access the data stream without having to change existing code, systems or other teams within your organization sounds useful in your use case, than it would be a good idea to design for that up front and go with the ESB approach. This allows new interested consumers to come in and not have to rewrite the process used by the existing systems. ESB/Lightweight integration systems tend to allow that design pattern more efficiently than DI/ETL tools.

    Some random thoughts:

    • ESB's support that "one bad record problem" by allowing you to route that to an error queue to have a human look at it and then republish
    • ETL/DI tend to have a straight-line happy-path speed advantage
    • ETL/DI start getting complicated once you go past the simple point-to-point integration use case
    • IMHO: ESB's are better at supporting versioning of data sets, services and data models.
    • ETL/DI tend to have more mature UI's for non-technical users to perform data mapping tasks
    • ESB's are really strong at supporting runtime decoupling of systems. If System B is down, the data just sits in a queue until it comes back up. No long running blocking thread or risk of having to restart a job
    • ESB has a slightly higher ramp-up curve
    • ETL/DI generally leads to ESB eventually (most vendors offer both a DI and ESB product)