Search code examples
sqlteradatadata-masking

Masking sensitive Data in SQL Teradata


Dears,

please I have a column with a SerialID like Below which I need to mask it using a SQL query

|SerialID            |
|--------------------|
|00027083691581342079|
|00027139741580324755|
|00027016011583677218|
|00027016011583677218|
|00027139061577873262|
|00027056351580383150|
|00027143131580371648|
|00027143131580371648|
|00027143131580371648|
|00027147651583526704|

to be the output like the below

|SerialID            |NewID|
|--------------------|-----|
|00027083691581342079|1    |
|00027139741580324755|2    |
|00027016011583677218|3    |
|00027016011583677218|3    |
|00027139061577873262|4    |
|00027056351580383150|5    |
|00027143131580371648|6    |
|00027143131580371648|6    |
|00027143131580371648|6    |
|00027147651583526704|7    |

Solution

  • You can use dense_rank():

    select serialid, dense_rank() over (order by serialid) as newid
    from t;
    

    EDIT:

    I have had experience on some parallel systems that the following is faster:

    select t.*, newid       
    from t join
         (select serialid, 
                 row_number() over (order by serialid) as newid
          from t
          group by serialid
         ) tt
         on t.serialid = tt.serialid;
    

    Some databases need to sort all the data on a single node because there is no partition by using the dense_rank(). The aggregation query reduces the amount of data so it can significantly improve performance -- but I'm not sure if this trick works in Teradata.