Search code examples
amazon-web-servicesapache-iceberg

Apache Iceberg tables and primary keys


We’re looking at moving our data from an on-prem Microsoft SQL Server to AWS and are looking into various table formats like Hudi, Delta Lake, and Apache Iceberg. Our current setup in SQL Server uses auto-increment IDs for most of our primary keys and it doesn't seem Iceberg has a straightforward equivalent.

I’m trying to figure out the best way to deal with unique identifiers in Iceberg, especially since we rely on these auto-increment IDs a lot. For example, taking a stockmarket example, you would have a Security table with details like Security Code, Description, and ISIN, and a Price table where each price entry is linked to a security via its ID.

Any suggestions on how to replicate or replace the auto-increment functionality in Iceberg?


Solution

  • Yes, Iceberg don't have inbuilt auto increment number but it depends more on the SQL engine you are using for processing. Example Trino have UUID data type which can act as a primary key and UUID is supported by Iceberg as well.

    Or you can implement UDF if using Spark as a processing engine.

    Or similar to Oracle( which now supports Auto increment) sequences, can implement it using a table and updating/increment the value after each insert.