Search code examples
apache-sparkdatabricksdelta-lake

DataBricks: Any way to reset the Generated IDENTITY column?


Delta tables have the capability to generate an identity column, like so:

CREATE TABLE TestMe (
  pKey bigint GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1),
  id bigint,
  commentary string
  )
 USING DELTA
 OPTIONS (PATH "/mnt/Delta/Testing/TestMe")

However, there seems to be no way to reset the counter without manually editing the changelog files which seems risky.

If I want to truncate the table and place a new set of data in there, how can I do that without slowly accumulating an identity column in the hundreds of millions / billions (every data insert the counter just goes up and never resets)?


Solution

  • Please try this way

    CREATE TABLE TestMe (
      pKey bigint GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1),
      id bigint,
      commentary string
    );
      
    insert into TestMe (id,commentary) values (1,'a'),(2,'b'),(3,'c');
      
    select * from TestMe;
      
    truncate table TestMe;
    
    describe history TestMe;
      
    Restore Table TestMe to version as of 0;
    
    insert into TestMe (id,commentary) values (1,'a'),(2,'b'),(3,'c');
    

    Alternative Answer works better

    CREATE or REPLACE TABLE TestMe (
      pKey bigint GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1),
      id bigint,
      commentary string
    );
      
    insert into TestMe (id,commentary) values (1,'a'),(2,'b'),(3,'c');
    
    select * from TestMe;
    
    -- Rerun Create or Replace resets the identity
    
    CREATE or REPLACE TABLE TestMe (
      pKey bigint GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1),
      id bigint,
      commentary string
    );