Suppose I have the following table
| City | Name |
|--------|--------|
| Paris | Tim |
| Paris | Alice |
| Paris | Sandra |
| Paris | Sandra |
| Paris | Lisa |
| London | Joe |
| London | Joe |
| London | Tim |
| London | Peter |
And I want generate an (pseudo) "anonymized" name per city for each unique user (for instance city
name per city suffixed by _n
where n is a simple count for each unique name per city).
I could do the following
SELECT City
, Name
, CONCAT(City, '_', ROW_NUMBER() OVER (PARTITION BY City ORDER BY Name)) AS anon_name
FROM employees
GROUP BY City
, Name;
which gives me what I want:
| City | Name | anon_name |
|--------|--------|-----------|
| London | Joe | London_1 |
| London | Peter | London_2 |
| London | Tim | London_3 |
| Paris | Alice | Paris_1 |
| Paris | Lisa | Paris_2 |
| Paris | Sandra | Paris_3 |
| Paris | Tim | Paris_4 |
However, I am not very happy with the solution as the users can be derived as the number is simply assigned alphabetically.
Instead of a sequence I could use NEWID()
:
SELECT City
, Name
, CONCAT(City, '_', NEWID()) AS anon_name
FROM employees
GROUP BY City
, Name;
| City | Name | anon_name |
|--------|--------|---------------------------------------------|
| London | Joe | London_A06BFFC9-E122-4CB2-AFDE-B1980EC50A1F |
| London | Peter | London_B153D072-F75A-4004-B922-189D64656AC1 |
| London | Tim | London_AF40F949-5D55-4896-907F-17EE0165C569 |
| Paris | Alice | Paris_9F779CFA-B799-409A-801B-9CC044295B13 |
| Paris | Lisa | Paris_CFA8AB45-CCEF-4223-B86B-8A71A8335D13 |
| Paris | Sandra | Paris_1FEE5266-8FFF-431E-A76D-CB490FE77BB4 |
| Paris | Tim | Paris_A7BAF19F-1B97-470D-BF41-81C5B6BFFCBA |
This generates quite long labels, but it is not possible to relate it to the name.
Hence, my question:
Can I get the short version (that is just a number suffix) which is however randomized such that the name cannot be easily derived from the number, such as in
| City | Name | anon_name |
|--------|--------|-----------|
| London | Joe | London_2 |
| London | Peter | London_1 |
| London | Tim | London_3 |
| Paris | Alice | Paris_1 |
| Paris | Lisa | Paris_2 |
| Paris | Sandra | Paris_4 |
| Paris | Tim | Paris_3 |
Sample Data generated by:
CREATE TABLE employees
([City] varchar(8), [Name] varchar(8))
;
INSERT INTO employees
([City], [Name])
VALUES
('Paris', 'Tim'),
('Paris', 'Alice'),
('Paris', 'Sandra'),
('Paris', 'Sandra'),
('London', 'Joe'),
('London', 'Joe'),
('London', 'Tim'),
('London', 'Peter')
;
You can use newid()
in the first expression:
SELECT City, Name,
CONCAT(City, '_', ROW_NUMBER() OVER (PARTITION BY City ORDER BY NEWID())) AS anon_name
FROM employees
GROUP BY City, Name;
If you just want a randomized suffix, you could use checksum()
:
SELECT City, Name,
CONCAT(City, '_', CHECKSUM(Name)) AS anon_name
FROM employees
GROUP BY City, Name;
However, there is the risk of collision -- where two users with the same name have different checksums. There are ways around this, if you don't want sequential ids.