Search code examples
sqlverticadata-masking

Random Digit Data Masking on Id_number in Vertica


I need to mask 4 digit of id_number rondomly between 6th and 12th digits.

Example: 555444888777 --> 55544x8x8xx7

I wrote following code but it is rondom for every 2 digit. Is there any solution for rondomly masking with given intervals and how many digits for needed to mask in Vertica ?

SELECT OVERLAYB(OVERLAYB(OVERLAYB(OVERLAYB('555444888777', 'x', 5+RANDOMINT(2)),'x', 7+RANDOMINT(2)),'x', 9+RANDOMINT(2)),'x',11+RANDOMINT(2));


Solution

  • If you really want to randomly replace a digit with an 'x' at , randomly, the first or second digit after positions 5,7,9 and 11, as you coded it, then create a function as I did, so you don't need to re-code the nested OVERLAYB() calls every time.

    You can replace 5,7,9 and 11 with a RANDOMINT() call, too, if you want more variance.

    If, however, you want to vary the number of replacements (from 4 times to another number of times), you will have to re-write the function for a different number of replacements. Or go through the trouble of writing a UDx (User Defined Extension), in C++, Java, R or Python.

    Check the Vertica docu for that; start here: https://www.vertica.com/docs/10.0.x/HTML/Content/Authoring/SQLReferenceManual/Statements/CREATEFUNCTIONUDF.htm?zoom_highlight=create%20function

    Having said that, here goes the function with your same functionality, and its test:

    CREATE OR REPLACE FUNCTION maskrand( 
      s VARCHAR(256)
    ) 
    RETURN VARCHAR(256) 
    AS 
    BEGIN
      RETURN (
        OVERLAYB(
          OVERLAYB(
            OVERLAYB(
              OVERLAYB(
                s
              , 'x'
              , 5+RANDOMINT(2)
              )
              ,'x'
            , 7+RANDOMINT(2)
            )
            ,'x'
          , 9+RANDOMINT(2)
          )
          ,'x'
        , 11+RANDOMINT(2)
        )
      );
    END;
    
    -- test ...
    WITH indata(s) AS (
              SELECT '555444888777'
    UNION ALL SELECT '666333444888'
    ) 
    SELECT
      s, maskrand(s) AS masked
    FROM indata;
    -- out       s       |    masked    
    -- out --------------+--------------
    -- out  555444888777 | 5554x48x8xx7
    -- out  666333444888 | 6663x3x4x88x