I'm attempting to rename duplicate files in a table that is from an EDM system with 300,000 rows. The issue that I'm running into is that there are thousands of duplicate filenames in hundreds of folders, each with multiple versions of a file. I need to create unique filenames for import into a new system, so I am appending prior to the file extension -1, -2, -3, etc. based on the filename and folder location.
I want to append the initial instance of the filename and all of it's versions in the first folder as "-1", the second instance of a filename and it's versions in another folder as "-2". I have file ID's I can use as the sort order for issuance of the appending, but the order really does not matter. I've include an example of the current table result, along with the desired result nested into it.
What I've come up with so far is:
With DuplicateFileName As
(
Select FolderID, FileName
, Row_Number() Over ( Partition By Filename Order By FolderID ) As Rnk
From NewFilename
)
Update NewFilename
Set FileModifier = Case
When Rnk > 1 Then '-' + Cast(Rnk As varchar(10))
Else ''
End
From NewFilename As T
Join DuplicateFileName As D
On D.FolderID = T.FolderID
Below is a sample of what I get for one filename. It's close, but I can't get it to append exactly like I need no matter what I try. I'm close, but just can't seem to get it perfect.
FolderID FileID FileName Version Current Desired
3506 16875 Gadget.XLSX 1 -1
3506 16876 Gadget.XLSX 2 -1
3506 16877 Gadget.DOCX 1 -1
3506 16878 Gadget.DOCX 2 -1
3522 17578 Gadget.XLSX 1 -3 -2
3522 17579 Gadget.DOCX 1 -3 -2
3525 17602 Gadget.XLSX 1 -4 -3
3525 17603 Gadget.DOCX 1 -4 -3
Instead of ROW_NUMBER() you needed to use DENSE_RANK() since the same file name in the same folder can have the same file modifier. And that CASE EXPRESSION was not needed. Consider the following
:
CREATE TABLE #NewFilename (FolderID int, FileID int, FileName varchar(40), Version int, FileModifier varchar(4) )
INSERT INTO #NewFilename VALUES(3506,16875,'Gadget.XLSX',1,'')
INSERT INTO #NewFilename VALUES(3506,16876,'Gadget.XLSX',2,'')
INSERT INTO #NewFilename VALUES(3506,16877,'Gadget.DOCX',1,'')
INSERT INTO #NewFilename VALUES(3506,16878,'Gadget.DOCX',2,'')
INSERT INTO #NewFilename VALUES(3522,17578,'Gadget.XLSX',1,'')
INSERT INTO #NewFilename VALUES(3522,17579,'Gadget.DOCX',1,'')
INSERT INTO #NewFilename VALUES(3525,17602,'Gadget.XLSX',1,'')
INSERT INTO #NewFilename VALUES(3525,17603,'Gadget.DOCX',1,'')
With DuplicateFileName As
(
Select FolderID, FileName
, DENSE_RANK() Over ( Partition By Filename Order By FolderID ) As Rnk
From #NewFilename
)
Update #NewFilename
Set FileModifier = '-' + Cast(Rnk As varchar(10))
From #NewFilename As T
Join DuplicateFileName As D
On D.FolderID = T.FolderID
select * from #NewFilename
order by FolderID