Search code examples
sqlamazon-web-servicesamazon-redshift

What are the alternatives to IDENTITY on Redshift?


I have been trying to keep an ID column in a table as an IDENTITY(1,1) but it keeps incrementing by 2. So, the ids that I get are odd numbers - 1,3,5.... If I change the ID column to IDENTITY(0,1), I get even numbers - 0,2,4...

Is there a way to correct this issue and make sure it gets incremented by only 1?

Are there alternatives to IDENTITY on Redshift?


Solution

  • Redshift is a clustered solution and this has consequences. This happens on a COPY because COPY is a parallel action where each slice reads in data individually. As such to preserved uniqueness there may be gaps in the step.

    Per the Redshift documentation on CREATE TABLE:

    When you load the table using an INSERT INTO [tablename] SELECT * FROM or COPY statement, the data is loaded in parallel and distributed to the node slices. To be sure that the identity values are unique, Amazon Redshift skips a number of values when creating the identity values. Identity values are unique, but the order might not match the order in the source files.

    See: https://docs.aws.amazon.com/redshift/latest/dg/r_CREATE_TABLE_NEW.html

    The alternative is to manage the identity values yourself. This requires some care as making mistakes will cause invalid identity values. Be sure to manage coherency so that no intermediate value can be seen during updates.

    The upside is that you can make this work however you like. The downsides are the complexity, corner-cases, and performance. Bringing all information together for a large table to work out an exact ordering add effort on a cluster and is why Redshift doesn’t do this by default. The performance implications grow as the table size grows.

    You may be better off working with gaps in the identity values for those cases where this matters.