I'm having troubles optimizing this Levenshtein Distance calculation I'm doing. I need to do the following:
Here's my working version:
DECLARE @Results TABLE
(
ID int,
[Name] nvarchar(200),
Distance int,
Frequency int,
Trimmed bit
)
INSERT INTO @Results
SELECT ID,
[Name],
(dbo.Levenshtein(@Source, [Name])) As Distance,
Frequency,
'False' As Trimmed
FROM
MyTable
INSERT INTO @Results
SELECT ID,
[Name],
(dbo.Levenshtein(@SourceTrimmed, [Name])) As Distance,
Frequency,
'True' As Trimmed
FROM
MyTable
SET @ResultID = (SELECT TOP 1 ID FROM @Results ORDER BY Distance, Trimmed, Frequency)
SET @Result = (SELECT TOP 1 [Name] FROM @Results ORDER BY Distance, Trimmed, Frequency)
SET @ResultDist = (SELECT TOP 1 Distance FROM @Results ORDER BY Distance, Trimmed, Frequency)
SET @ResultTrimmed = (SELECT TOP 1 Trimmed FROM @Results ORDER BY Distance, Trimmed, Frequency)
I believe what I need to do here is to..
I know there has to be a good implementation to this but I can't figure it out... this is as far as I got:
SELECT top 1 @ResultID = ID,
@Result = [Name],
(dbo.Levenshtein(@Source, [Name])) As distOrig,
(dbo.Levenshtein(@SourceTrimmed, [Name])) As distTrimmed,
Frequency
FROM
MyTable
WHERE /* ... yeah I'm lost */
ORDER BY distOrig, distTrimmed, Frequency
Any ideas?
I think your attempt differs from the code that you say works in that the working code orders by distance first, whether or not that is original or trimmed distance. Your attempt orders by original distance first, then trimmed.
I'm not sure I understand what you're trying to do entirely, but does the following do what you need?
SELECT TOP 1
@ResultId = ID,
@Result = [Name],
@ResultDist = distOrig,
@ResultTrimmed = distTrimmed
FROM (
SELECT
ID, [Name],
dbo.Levenshtein(@Source, [Name]) As distOrig,
dbo.Levenshtein(@SourceTrimmed, [Name])) As distTrimmed,
Frequency
FROM MyTable
) AS T
ORDER BY
CASE WHEN distOrig > distTrimmed THEN distOrig ELSE distTrimmed END, -- Distance
CASE WHEN distOrig > distTrimmed THEN 1 ELSE 0 END, -- Trimmed
Frequency -- Frequency