Search code examples
sql-servert-sqlssms

Renaming duplicate files in a table from a version controlled EDM system?


I'm attempting to rename duplicate files in a table that is from an EDM system with 300,000 rows. The issue that I'm running into is that there are thousands of duplicate filenames in hundreds of folders, each with multiple versions of a file. I need to create unique filenames for import into a new system, so I am appending prior to the file extension -1, -2, -3, etc. based on the filename and folder location.

I want to append the initial instance of the filename and all of it's versions in the first folder as "-1", the second instance of a filename and it's versions in another folder as "-2". I have file ID's I can use as the sort order for issuance of the appending, but the order really does not matter. I've include an example of the current table result, along with the desired result nested into it.

What I've come up with so far is:

With DuplicateFileName As
        (
        Select FolderID, FileName
            , Row_Number() Over ( Partition By Filename Order By FolderID ) As Rnk
        From NewFilename
        )
    Update NewFilename
    Set FileModifier = Case
                        When Rnk > 1 Then '-' + Cast(Rnk As varchar(10))
                        Else ''
                        End
    From NewFilename As T
        Join DuplicateFileName As D
            On D.FolderID = T.FolderID

Below is a sample of what I get for one filename. It's close, but I can't get it to append exactly like I need no matter what I try. I'm close, but just can't seem to get it perfect.

 FolderID      FileID   FileName      Version   Current    Desired   
    3506        16875   Gadget.XLSX    1                    -1
    3506        16876   Gadget.XLSX    2                    -1
    3506        16877   Gadget.DOCX    1                    -1        
    3506        16878   Gadget.DOCX    2                    -1        
    3522        17578   Gadget.XLSX    1          -3        -2
    3522        17579   Gadget.DOCX    1          -3        -2
    3525        17602   Gadget.XLSX    1          -4        -3
    3525        17603   Gadget.DOCX    1          -4        -3

Solution

  • Instead of ROW_NUMBER() you needed to use DENSE_RANK() since the same file name in the same folder can have the same file modifier. And that CASE EXPRESSION was not needed. Consider the following

    :
    
    
    CREATE TABLE #NewFilename  (FolderID int, FileID int,   FileName varchar(40), Version int, FileModifier varchar(4) ) 
     INSERT INTO #NewFilename VALUES(3506,16875,'Gadget.XLSX',1,'')
    INSERT INTO #NewFilename VALUES(3506,16876,'Gadget.XLSX',2,'')
    INSERT INTO #NewFilename VALUES(3506,16877,'Gadget.DOCX',1,'')        
    INSERT INTO #NewFilename VALUES(3506,16878,'Gadget.DOCX',2,'')        
    INSERT INTO #NewFilename VALUES(3522,17578,'Gadget.XLSX',1,'')
    INSERT INTO #NewFilename VALUES(3522,17579,'Gadget.DOCX',1,'')
     INSERT INTO #NewFilename VALUES(3525,17602,'Gadget.XLSX',1,'')
     INSERT INTO #NewFilename VALUES(3525,17603,'Gadget.DOCX',1,'') 
    
      With DuplicateFileName As
            (
            Select FolderID, FileName
                , DENSE_RANK() Over ( Partition By Filename Order By FolderID ) As Rnk
            From #NewFilename
            )
        Update #NewFilename
        Set FileModifier =  '-' + Cast(Rnk As varchar(10))
        From #NewFilename As T
            Join DuplicateFileName As D
                On D.FolderID = T.FolderID
    
    
    
    select * from #NewFilename
    order by FolderID