Search code examples
sql-serverstring-comparisoncalculated-columns

create a calculated column from comparing two columns based off a substring in the columns in sql


I have the following dataset

Drawing Name Line Number Line Details
PL00XXX-0705-1300 2"-MSH-0513-16-C1-1 1/2"A MATCH
PL00XXX-0705-1100 2"-MSH-0513-16-C1-2"AE DUPLICATE / HEAT TRACE
PL00XXX-0705-1300 2"-WWS-0513-15-C1-0" MATCH / NON ISO
PL00XXX-0705-1300 2"-WWS-0513-15-C1-2"AE MATCH / HEAT TRACE
PL00XXX-0705-1100 2"-WWS-0513-15-C1-2"AE DUPLICATE / HEAT TRACE
PL00XXX-0705-1300 2"-WWS-0513-17-C1-2"AE DO NOTHING
PL00XXX-0705-1100 2"-WWS-0513-18-C1-2"AE DO NOTHING

The new calculated column I want to create is Line Details based if there are at LEAST 2 of a line number up to the last dash in the line number. IF there is not at least 2 of the same up to the last dash - do nothing.

The Line Details column shows Match if the drawing number would have 05-13 in it and the line number would have 0513 in it.

The Line Details column would show Duplicate if the drawing number had 05-13 in it and the line number had 0511 in it.

The Line Details column could ALSO show Heat Trace if the line number ends with an E.

The Line Details column could ALSO show Non Iso if the line number ends with 0".

The drawing number up to PL00XXX-07 is always the same per customer. it's what comes after that is important and how it's tied or not tied to the line number. At least 2 of the same line number means up to the - after C1. The amount of characters prior to that could be different, there could be a 2" line or a 1/2" line, but as long as the line number matches up to the C1- part of the line that represents 2 of the same.

IF this makes send PLEASE help. Greatly appreciated.

How can a query be written to only find duplicates up to the last hypen? I have the following line numbers:

2"-MSH-0513-16-S1-**1 1/2"A
2"-MSH-0513-16-S1-**2"AE
2"-MSH-0513-17-S1-**1 1/2"A
2"-MSH-0513-18-S1-**1 1/2"A
2"-FLW-0521-18-S1-**1"A
2"-FLW-0521-18-S1-**1"A

So the line numbers that I want to be shown in the list after the query is as follows:

2"-MSH-0513-16-S1-**1 1/2"A
2"-MSH-0513-16-S1-**2"AE
2"-FLW-0521-18-S1-**1"A
2"-FLW-0521-18-S1-**1"A

I know how to query a specific character count ONLY when the data is the exact same character count in the column. As you can see the character count can be different up the the last hyphen.

I have tried the following script:

select SUBSTRING(LINE_NUM_CONCAT_, 
                 1, 
                 regexp_instr(LINE_NUM_CONCAT_, 
                              '-', 
                              1, 
                              regexp_count(LINE_NUM_CONCAT_, 
                              '-')
                  ) - 1)
FROM  PID_Components_PROCESS_LINES

but regex_count is not a recongnized built-in function name???

PLEASE help.


Solution

  • use CASE expression to evaluate your condition and return the string accordingly and concatenate all as one string

    to check for duplicates use window function count(*) over (partition by ...)

    select [Drawing Name], [Line Number],
       [Line Details] =
       case when count(*) over (partition by line) < 2
            then 'DO NOTHING'
            else
                case when [Drawing Name] like '%05-13%' 
                     and  [Line Number]  like '%0513%'
                     then 'MATCH'
                     else ''
                     end
              + case when [Drawing Name] like '%05-11%' 
                     and  [Line Number]  like '%0513%'
                     then 'DUPLICATE'
                     else ''
                     end            
              + case when right([Line Number], 1) = 'E'
                     then '/ HEAT TRACE'
                     when right([Line Number], 2) = '0"'
                     then '/ NON ISO'
                     else ''
                     end
             end
    from PID_Components_PROCESS_LINES p
         cross apply
         (
            select line = left([Line Number], 
                               len([Line Number]) - charindex('-', reverse([Line Number]))
                              )
         ) l
    where p.[Drawing Name] like 'PL00528%'
    

    I have reorganized the query and using APPLY() operator to compute the various value. You can add the WHERE condition to exclude the NO NOTHING lines.

    select *,
           case when line_count < 2
                then 'DO NOTHING'
                else
            m.match
            + duplicate         
            + case when right([Line Number], 1) = 'E'
                   then '/ HEAT TRACE'
                   when right([Line Number], 2) = '0"'
                   then '/ NON ISO'
                   else ''
                   end
                   end
    from   (
               select *, line_count =  count(*) over (partition by line)
               from   PID_Components_PROCESS_LINES 
                      cross apply
                      (
                          select line = left([Line Number], 
                                             len([Line Number]) - charindex('-', reverse([Line Number])))
                      ) l
               where  [Drawing Name] like 'PL00528%'
           ) p
           cross apply
           (
               select match = case when [Drawing Name] like '%05-13%' 
                                   and  [Line Number] like '%0513%'
                                   then 'MATCH'
                                   else ''
                                   end  
           ) m
           cross apply
           (
               select duplicate = case when [Drawing Name] like '%05-11%' 
                                       and  [Line Number] like '%0513%'
                                       then 'DUPLICATE'
                                       else ''
                                       end 
           ) d
    where  p.line_count >= 2  -- exclude the do nothing lines