I have read-only access to a database where I need to pull data from.
One of the columns is for social security numbers(US). The data is not formatted correctly and I am dealing with entries like:
Our system needs the SSN to look like this: 123-45-0000
I was able to figure out how to properly format the 3rd one(123450000) with this:
DECLARE @ssn CHAR(12) = 123450000
SELECT stuff(stuff(@ssn, 4, 0, '-'), 7, 0, '-') AS Social_Security_Number
But is there a way to fit all these other ones into the proper format?
Stop worrying about which ones meet your formatting criteria, which ones are close, and how many different ways they're wrong. Just strip the formatting altogether and store numeric only; apply the formatting elsewhere. Much easier to constrain, too.
-- numeric only:
SELECT
REPLACE(REPLACE(REPLACE(garbage,'-',' '),',',' '),' ','')
FROM dbo.SSNs;
-- with formatting added back:
SELECT STUFF(STUFF(
REPLACE(REPLACE(REPLACE(garbage,'-',' '),',',' '),' ',''),
4,0,'-'),7,0,'-')
FROM dbo.SSNs;
Working example in this fiddle.
Once you've fixed the existing data, add a constraint that says something like LEN(SSN) = 9 AND SSN NOT LIKE '%[^0-9]%'
- strip any non-digits from the input on the way in, fail any write that doesn't do that, and apply the dashes - which are presentation only - in the UI (or in a view or a computed column if it can't be where it belongs).