Search code examples
web-servicesdata-miningsoa

Service Oriented Architecture and Loose Coupling vs SQL JOINS


Let's suppose we have got a SOA infrastructure like the one painted below and that every service can run on a different host (this is especially valid for the two extra-net service "web site" and "payment system").

SOA infrastructure

Clearly we have got a data (persistence) layer. Suppose it's implement through EJB + JPA or something alike.

If we want to join data (in user UI) between the different services I see at least a couple of alternatives:

  • we want to do efficient JOINs at RDBMS level so we have a package (ie. persistence.package) that contains all the entities and session facades (CRUD implementation) which in some way has to be shared (how ?) or deployed for every service. That said, if I change something in the order schema I must redeploy this packages introducing tight coupling between pretty much everything. Moreover the database must be unique and shared.

  • to avoid such issues, we keep an entity package for each different service (i.e. order.package) and let the services communicate through some protocol (soap, rest, esb, etc.). so we can keep data locally in each host (share nothing architecture) and we don't need to redeploy the entity package. But this approach is terrible for data-mining as a query that must search and return correlated data between multiple services will be very inefficient (as we cannot do SQL joins)

Is there a better / standard approach to the issues pointed above ?


Solution

  • The main motivation for SOA is independent components that can change separately. A secondary motivation,as Marco mentioned, is simplifying a system into smaller problems that are easier to solve. The upside of different services is flexibility the downside is more management and overhead - that overhead should be justified by what you get back - see for example a SOA anti-pattern I published called Nanoservices which talks about this balance

    Another thing to keep in mind is that a web-service API does not automatically mean that that's a service boundary. Several APIs that belong to a larger service can still connect to the same database underneath. so for example, if in your system payments and orders belong together you shouldn't separate them just because they are different APIs (In many systems these are indeed different concerns but, again, that's not automatic)

    When and if you do find the separation into services logical than you should follow Marco's advice and ensure that the services are isolated and don't share databases. Having services isolated this way serves toward their ability to change. You can then integrate them in the UI with a composite front end. You should note that this works well for the operational side of the application as there you only need a few items from each service. For reporting you'd want something like aggregated reporting i.e. export immutable copies of data into a central database optimized for reporting (e.g denormalized star-schema etc.)