Search code examples
apache-kudu

KUDU for JDBC replication purposes, but not for Off-loaded Analytics


Given the quote from Apache KUDU official documentation, namely: https://kudu.apache.org/overview.html

Kudu isn't designed to be an OLTP system, but if you have some subset of data which fits in memory, it offers competitive random access performance. We've measured 99th percentile latencies of 6ms or below using YCSB with a uniform random access workload over a billion rows. Being able to run low-latency online workloads on the same storage as back-end data analytics can dramatically simplify application architecture.

Does this statement imply that KUDU can be used for replication from a JDBC source - the simplest form possible?


Solution

  • Elsewhere I have used KUDU for replicating to from SAP and other COTS, so that reports could run against the KUDU tables as opposed to Hana. That was an architecture decided upon by others.

    For pure replication of data, primarily for subsequent extractions from a Data Lake, for data with embellished history with a size < 1TB, this is feasible as well. Cloudera confirmed this after discussion. Even though KUDU has a columnar format and a row format would be desirable, it simply works as well.