Search code examples
apache-kudu

Kudu auto generated key column


I am trying to make custom auto generated/incremented key in Kudu which will keep increasing its value -from a starting seed which zero by default.

It's pretty inefficient to go through all records and increment a counter to get a row count.

Does Kudu provide the rows count out of the box? If not, what are the best way to get it?


Solution

  • Apache Kudu does not support AUTO_INCREMENT columns at this time. There is a FAQ entry on the Kudu web site that mentions this.

    Kudu is a distributed storage engine that is focused on being a good analytical store (OLAP) as opposed to being a good transactional store (OLTP) and it shows in the features we've prioritized so far. This is a good example of that.

    Because we're not trying to be an OLTP store, Kudu doesn't yet implement multi-row or multi-node transactions, and so a simple incrementing primary key counter would be difficult to implement correctly at this time -- especially for example when the table is hash-partitioned on the primary key. We'd need a central transaction coordinator that doesn't currently exist.

    To answer your second question, getting a row count is currently a little expensive in Kudu as it involves scanning the index column on each tablet and summing up the total count. Apache Impala / Apache Spark SQL will do this transparently for you if you do a SELECT COUNT(*) from kudu_table but I wouldn't currently rely on that for the purposes of assigning a new ID, since Impala currently allows scanning from a slightly stale Kudu replica thus potentially being off on the row count.

    The best thing to do right now is rely on some external mechanism to assign row IDs.

    Source: I am a PMC member on Apache Kudu.