Search code examples
sqlsql-serversql-server-2008substringpatindex

sql query to return results which match a search key but also retrieves records which are alphabetically after the search key


The title is a mouthful but I found it hard to explain.

I had a previous web service which allowed a user to enter a search key e.g. 4, and that would return records which started with that search key e.g. 4, 4a, 4b.

Now I've been asked to implement a solution whereby if a user types 4, it would return results:

4
4a
4aa
4b
5
5a
5b
5c
5Z
6
52.
etc.. 

for 50 records.

I'm not sure where to start in altering the query...the col being searched is alphanumeric and that is the confusing bit.

The query itself is over 20 lines long but the part linking the results to the search key is just;

(col LIKE @searchKey + '[a-zA-Z.]%' OR col = @searchKey OR col >= @searchKey)

And as requested the ordering is:

ORDER BY   
        CASE WHEN col LIKE '[a-zA-Z]%' Then 1 ELSE 0 END,

                 CAST(SUBSTRING(col, 0, 

                  CASE WHEN patindex('%[a-zA-Z.]%', col) = 0

                  THEN len(col)+1

                  ELSE patindex('%[a-zA-Z.]%', col)

                  END) 

                 as INT),

    CASE WHEN col LIKE '[a-zA-Z]%' THEN SUBSTRING(col, 1,1) END,

    CASE WHEN col LIKE '[a-zA-Z]%' AND len(col) = 1 THEN 0 ELSE 1 END,

    CASE WHEN col LIKE '[a-zA-Z]%' AND len(col) = 2 THEN 0 ELSE 1 END,

    CASE WHEN col LIKE '[a-zA-Z]%' AND len(col) = 3 THEN 0 ELSE 1 END,

    col;

Right now if I pass in 82 for example, I am getting results:

8
8A
8B
8E
8H
9
9A
9C
9D
9E
82
82A
82B
82C
.
.
.
.
99R

However what I need is that if the user types 82, they do not receive results like 8, 8x, 9, 9x etc.


Solution

  • You need to separate out the numeric portion from the alpha portion and then recombine them into a new number value that can be sorted and filtered on. You will need to do the same operation for the user input.

    Note that this doesn't handle the period you mentioned in your post as you have not told us how it's supposed to be handled. If it's irrelevant, then use the replace function to get rid of it. Also, this does not handle situations where the alpha portion of the string is longer than one character.

    I made the assumption that the user can input a character after their numeric input. If this is not the case, this can be simplified by not having to convert the user input and not having to multiply the numeric portion by a large number.

    The 100000 should be larger than the largest number possible in the numeric portion of the string. Add additional zeroes to it as needed.

    SQL Fiddle

    declare @t table (col varchar(4))
    declare @userinput varchar(4)
    
    set @userinput = '9C'
    
    insert into @t values ('8')
    insert into @t values ('8A')
    insert into @t values ('8B')
    insert into @t values ('8E')
    insert into @t values ('8H')
    insert into @t values ('9')
    insert into @t values ('9A')
    insert into @t values ('9C')
    insert into @t values ('9D')
    insert into @t values ('9E')
    insert into @t values ('82')
    insert into @t values ('82A')
    insert into @t values ('82B')
    insert into @t values ('82C')
    insert into @t values ('99R')
    
    ;with cte as (
        select 
            col,
            CONVERT(int,
                case when PATINDEX('%[a-z]%', col) = 0 
                    then col
                    else LEFT(col, PATINDEX('%[a-z]%', col) - 1)
                end
            ) * 100000
            +
            ascii(case when PATINDEX('%[a-z]%', col) = 0
                then ' '
                else SUBSTRING(col, PATINDEX('%[a-z]%', col), len(col))
            end) newcol
    
        from @t
    )
    
    select *
    from cte
    where 
        newcol >= (
            CONVERT(int,
                case when PATINDEX('%[a-z]%', @userinput) = 0 
                    then @userinput
                    else LEFT(@userinput, PATINDEX('%[a-z]%', @userinput) - 1)
                end
            ) * 100000
            +
            ascii(case when PATINDEX('%[a-z]%', @userinput) = 0
                then ' '
                else SUBSTRING(@userinput, PATINDEX('%[a-z]%', @userinput), len(@userinput))
            end)
        )
    order by newcol