Search code examples
pdfssisrowssmscharindex

SQL to determine which rows to retain and which to remove


Good Morning Everyone,

I am having a bit of a mental block and was going to see if anyone can help me. I have a table that inventories PDF Files that our office creates. We have changed the naming convention and I am trying to develop logic that specifies when a PDF with the new naming convention has been created to tag the old one so that I can develop a batch script to move them out of the files location. Below are some examples. Each file is its own row in the table by the way.

     PAR                          ORIGFILENAME

 111100000012         |          1100000012.pdf

 111100000012         |          1100000012_C_1_UB.pdf

 111100000012         |          1100000012_R_1.pdf

The new naming convention contains _R_Number or _C_Number. In the above example The first file is old and I want to retain the second 2.

A second example that has a bit more. Below are 5 files. I want to retain the last two that have the new naming convention and remove the top 3.

    PAR                           ORIGFILENAME

 1100000076                      1100000076-2.pdf

 1100000076                      1100000076-3.pdf

 1100000076                      1100000076.pdf

 1100000076                      1100000076_C_7_BARN.pdf

 1100000076                      1100000076_R_1.pdf

My plans if I can key on these old files when a new one exists is to develop those names into a batch script and incorporate it into a SSIS package that will run weekly to keep our PDF repository clean. I appreciate any help and incite.


Solution

  • The following should work, although a more varied amount of sample data would be useful.

    The following uses an updatable CTE to identify the old/new format names and deletes the old format where the same PAR has a new format

    with f as (
      select *,
        case when OrigFilename like '%*_%' escape '*' then 0 else 1 end del
      from t
    )
    delete from f
    where del=1 
      and exists (select * from f f2 where f2.par=f.par and f2.del=0)