Search code examples
eventsarchitecturecloudanalyticseda

Should event driven architecture be targeted for all data & analytics platforms?


For example,

  • You have an IT estate where a mix of batch and real-time data sources exists from multiple systems, e.g. ERP, Project management, asset, website, monitoring etc.
  • The aim is to integrate the datasources into a cloud environment (agnostic).
  • There is a need for reporting and analytics on combinations of all data sources.
  • Inevitably, some source systems are not capable of streaming, hence batch loading is required.
  • Potential use-cases for performing functionality/changes/updates based on the ingested data.

Given a steer for creating a future-proofed platform, architecturally, how would you look to design it?


Solution

  • It's a very open-end question, but there are some good principles you can adopt to help direct you in the right direction:

    Avoid point-to-point integration, and get everything going through a few common points - ideally one. Using an API Gateway can be a good place to start, the big players (Azure, AWS, GCP) all have their own options, plus there's lots of decent independent ones like Tyk or Kong.

    Batches and event-streams are totally different, but even then you can still potentially route them all through the gateway so that you get the centralised observability (reporting, analytics, alerting, etc).

    Use standards-based API specifications where possible. A good REST based API, based off a proper resource model is a non-trivial undertaking, not sure if it fits with what you are doing if you are dealing with lots of disparate legacy integration. If you are going to adopt REST, use OpenAPI to specify the API's. Using this standard not only makes it easier for consumers, but also helps you with better tooling as many design, build and test tools support OpenAPI. There's also AsyncAPI for event/async API's

    Do some architecture. Moving sh*t to cloud doesn't remove the sh*t - it just moves it to the cloud. Don't recreate old problems in a new place.

    • Work out the logical components in your new solution: what does each of them do (what's it's reason to exist)? Don't forget ancillary components like API catalogues, etc.
    • Think about layering the integration (usually depending on how they will be consumed and what role they need to play, e.g. system interface, orchestration, experience APIs, etc).
    • Want to handle data in a consistent way regardless of source (your 'agnostic' comment)? You'll need to think through how data is ingested and processed. This might lead you into more data / ETL centric considerations rather than integration ones.

    Co-design. Is the integration mainly data coming in or going out? Is the integration with 3rd parties or strictly internal?

    If you are designing for external / 3rd party consumers then a co-design process is advised, since you're essentially designing the API for them.

    If the API's are for internal use, consider designing them for external use so that when/if you decide to do that later it's not so hard.

    Taker a step back:

    • Continually ask yourselves "what problem are we trying to solve?". Usually, a technology initiate is successful if there's a well understood reason for doing it, which has solid buy-in from the business (non-IT).
    • Who wants the reporting, and why - what problem are they trying to solve?