I need to develop application that can be getting data from multiple data sources ( Oracle, Excel, Microsoft Sql Server
, and so on) using one SQL query
. For example:
SELECT o.employeeId, count(o.orderId)
FROM employees@excel e. customers@microsoftsql c, orders@oracle o
WHERE o.employeeId = e.employeeId and o.customerId = c.customerId
GROUP BY o.employeeId;
This sql and data sources must be changes dynamically by java program. My customers want to write and run sql-like query
from different database and storage in same time with group by, having, count, sum
and so on in web interface of my aplication. Other requirements is perfomance and light-weight.
I find this way to do it (and what drawbacks I see, please, fix me if I wrong):
Apache Spark (drawbacks: heavy solution, more better for BigData, slow if you need getting up-to-date informations without cached it in Spark),
Distributed queries in SQL server (Database link of Oracle, Linked server of Microsoft SQL Server, Power Query of Excel) - drawbacks: problem with change data sources dynamically by java program and problem with working with Excel,
Prestodb (drawbacks: heavy solution, more better for BigData),
Apache Drill (drawbacks: quite young solution, some problem with not latest odbc drivers and some bugs when working),
Apache Calcite (ligth framework that be used by Apache Drill, drawbacks: quite young solution yet),
Do join from data sources manually (drawbacks: a lot of work to develop correct join, "group by" in result set, find best execution plan and so on)
May be, do you know any other way (using free open-source solutions) or give me any advice from your experience about ways in above? Any help would be greatly appreciated.
One of the appropriate solution is DataNucleus platform which has JDO, JPA and REST APIs. It has support for almost every RDBMS (PostgreSQL, MySQL, SQLServer, Oracle, DB2 etc) and NoSQL datastore like Map based, Graph based, Doc based etc, database web services, LDAP, Documents like XLS, ODF, XML etc.
Alternatively you can use EclipseLink, which also has support for RDBMS, NoSQL, database web services and XML.
By using JDOQL which is part of JDO API, the requirement of having one query to access multiple datastore will be met. Both the solutions are open-source, relatively lightweight and performant.
Why did I suggest this solution ?
To read more about polyglot persistence