Search code examples
sqlsql-server

How to pull width & height out of string?


Our product table stores width & height directly in the product description.

I'm trying to break out width & height into their own separate numeric columns, so we don't have to try and mess with long strings every time we need size data.

However, the issue I'm running into is that these descriptions aren't standardized. Most sizes are correctly stored as W" X H", but some can also be W"XH", -W"xH"-, and some can even have a third double quote for corner rounding, length, etc.

Is it possible to create a one-size-fits-all function that can pull width & height out of all types of description formats? Should I break them down into separate functions for each format?

Constants:

  1. All widths & heights are whole numbers
  2. All widths & heights are directly followed by a double quote
  3. There is an "X" between all widths & heights

Variables:

  1. Spacing & surrounding characters (not always a blank space before/after dimensions)
  2. Number of double quotes can be more than 2.
  3. Can be either 1, 2, or 3 digit numbers (1" X 1", 10" X 1", 100" X 1", etc.)

Sample Data:

Product Description Expected Width Expected Height
1000 HIP .100 12" X 18" Sign Digital 12 18
1001 HIP-080-36"X72"-Simple Sign-RS 36 72
1002 DG3 .080 21" x 21" Sign 3" R 21 21
1003 HIP-080-36"X72"-Simple Sign-RS 36 72
1004 DG3 FLO-125-100"x80"-Hand Applied 100 80
1005 DG3 FLUOR Dbl .080 30" X 30" X 30" Yield 30 30

What I've tried:

SELECT
    UPPER(REPLACE(
        REPLACE(
            RIGHT(
                LEFT(Description,CHARINDEX('"',Description,CHARINDEX('"',Description)+1)-1), 
            LEN(
                LEFT(Description,CHARINDEX('"',Description,CHARINDEX('"',Description)+1)-1)
                ) 
            - CHARINDEX('"', 
                LEFT(Description,CHARINDEX('"',Description,CHARINDEX('"',Description)+1)-1)
            ) + 3)
        , '"', '')
    , ' ','')) AS Result, Description

FROM HQMT
WHERE UPPER(Description) LIKE '%"%X%"%'

Result:

Result Description
12X18 HIP .100 12" X 18" Sign Digital
36X72 HIP-080-36"X72"-Simple Sign-RS
21X21 DG3 .080 21" x 21" Sign 3" R
36X72 HIP-080-36"X72"-Simple Sign-RS
00X80 DG3 FLO-125-100"x80"-Hand Applied
30X30 DG3 FLUOR Dbl .080 30" X 30" X 30" Yield

With these results, I plan to pull all characters left of the "X" for width and all characters right of the "X" for height. This is mostly correct, but 3 digit widths are getting cut off. Any ideas on how to fix / make more dynamic?


Solution

  • Use TRANSLATE to remove all chars except numbers and X (applied to 4 chars preceeding ")

    SELECT
        UPPER(
         REPLACE(TRANSLATE(
                RIGHT(
                    LEFT(Description,CHARINDEX('"',Description,CHARINDEX('"',Description)+1)-1), 
                LEN(
                    LEFT(Description,CHARINDEX('"',Description,CHARINDEX('"',Description)+1)-1)
                    ) 
                - CHARINDEX('"', 
                    LEFT(Description,CHARINDEX('"',Description,CHARINDEX('"',Description)+1)-1)
                ) + 4)
                , 'abcdefghijklmnopqrstuvwyz+()- ,#+"', '@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@'), '@', '')  
             ) 
      AS Result, Description
    
    FROM HQMT
    WHERE UPPER(Description) LIKE '%"%X%"%'
    

    fiddle