Search code examples
apietlsaasdata-ingestiondata-lake

SaaS App data ingestion to DL/DWH - what include into NFR?


We are in the process for buying SaaS solution for busy sales operations. We want to ensure that we have ability to access our data and ingest it into our analytics data lake (some real-time). I am looking for advice for what requirements should we have/prefer for vendors and their solutions?

APIs - most vendors mention that they provide APIs for data access, however, what features APIs need to have to be suitable for data ingestion into Analytics data lake?. For example Salesforce has Bulk API, does this mean that if vendor only offers "lean APIs", they won't work for DL use case?

Direct SQL Access - shall we prefer SaaS solutions that offer single tenant DBs so that we could obtain direct SQL access? DB replica - shall we expect that vendor provides a DB replica (if it's single tenant) and we use it as a data store for reporting. Obviously, that extra costs for us.

Direct SQL Access via ODBC - I also read that if SaaS app has multi-tenants, ODBC/JDBC drivers could be built to access DB data via SQL but with proper authorization to ensure data security? Would this be a valid request/approach?

Staged tables - shall we request the vendor to stage their DB tables (as files) and load to our (or theirs) data lake environment. This then would be a raw data source analytics and data archive. My concern is incremental updates.

Any other options we should consider/ look for in vendor solutions or request?

Thank you!


Solution

  • You need to provide your requirements (data lake architecture, data latency, etc.) to the vendors and get them to provide the solution that will work with their product.