Search code examples
sql-serverdesign-patternsrangevarbinary

Varbinary search with min and max pattern type


So I'm trying to make a query which goes through varbinary data. The issue is that I can't really finish what I'm trying to achieve. What you should know about the column is varbinary(50) and the patterns that occur have no specific order in writing, meaning every prefix could be anywhere as long it has 3 bytes(0x000000) First one is the prefix second and third are value data that I'm looking to check if its within the range i like. All the data is written like this.

What I've tried:

DECLARE @t TABLE (
    val VARBINARY(MAX)
)

INSERT INTO @t SELECT 0x00000100000000000000000000000000000000000000000000000000
INSERT INTO @t SELECT 0x00001000000000000000000000000000000000000000000000000000
INSERT INTO @t SELECT 0x00010000000000000000000000000000000000000000000000000000
INSERT INTO @t SELECT 0x00100000000000000000000000000000000000000000000000000000
INSERT INTO @t SELECT 0x00000f00000000000000000000000000000000000000000000000000

declare @pattern varbinary(max)
declare @pattern2 varbinary(max)
set @pattern = 0x0001
set @pattern2 = @pattern+0xFF

select @pattern,@pattern2

SELECT
    *
FROM @t
WHERE val<@pattern
OR val>@pattern2

This was total bust the patterns were accurate up to 2 symbols if I were to use 4 symbols as pattern it would work only if the pattern is in predefined position. I've tried combination of this and everything below.

WHERE CONVERT(varbinary(2), val) = 0xdata

also this:

select * 
from table
where CONVERT(varchar(max),val,2) like '%data%'

Which works great for searching exact patterns, but not for ranges, I need some combination of both.

I'm aware I could technically add every possible outcome 1 by 1 and let it cycle through all the listed possibilities, but there has to be a smarter way. Goals:

  1. Locating the prefix(first binary data pair)
  2. Defining a max value after the prefix, everything above that threshold to be listed in the results. Let's say '26' is the prefix, the highest allowed number after is '9600' or '269600'. Basically any data that exceeds this pattern '269600' should be detected example '269700'. or query result would post this:

select * from table where CONVERT(varchar(max),attr,2) like '%269700%'

I need something that would detect this on its own while i just give it start and end to look in between like the highest number variation would be '26ffff', but limiting it to something like 'ff00' is acceptable for what I'm looking for.

My best guess is 2 defined numbers, 1 being the allowed max range and 2nd for a cap, so it doesn't go through every possible outcome. But I would be happy to whatever works. I'm aware this explanation is pretty dire, but bear with me, thanks.

*Update after the last suggestion

SELECT MIN(val), MAX(val) FROM @t where CONVERT(varchar(max),val,2) like '%26%'

This is pretty close, but its not sufficient i need to cycle through alot of data and use it after this would select only min or max even with the prefix filter. I believe i need min and max defined as a start and end range where the query should look for.

**Update2

I'm afraid you would end up disappointed, its nothing that interesting. The data origin is related to a game server which stores the data like this. There's the predefined prefixes which are the stat type and the rest of the data is the actual numeric value of the stat. The data is represented by 6 characters data intervals. Here is a sample of the data stream. Its always 6-6-6-6-6 as long there's space to record the data on since its capped at 50 characters.

0x0329000414000B14000C14000D0F00177800224600467800473C00550F00000000000000000000000000

**Update3

Yes the groups are always in 3byte fashion, yes my idea was exactly that use the first byte to narrow down the search and use then use the second 2 bytes to filter it. I just don't know how to pull it off in an effective way. I'm not sure if i understood what u meant by predictively aligned assuming you meant if stat/prefix/header would always end up at the same binary location, if that's correct the answer is no. If the 3byte pattern is violated the data becomes unreadable meaning even if you don't need the extra byte you have to count it otherwise the data breaks example of a working data.

0x032900'041400'

example of a broken data:

 0x0329'041400'

The only issue i could think is when the prefix and part of the value are both true example:

0x262600

Unless the query is specifically ordered to read the data in 3byte sequence meaning it knows that the first byte is always a prefix and the other 2 bytes are value.

Q:Can that be used as an alignment indicator so that the first non-zero byte after at least 3 zero bytes indicates the start of a group?

A:Yes, but that's unlikely I mean it although possible it would be written in order like:

0x260000'270000'

It wouldn't skip forward an entire 3byte group filled with no data. This type of entry would occur if someone were to manually insert it to the db, the server doesn't make records with gaps like those as far I'm aware:

0x260000'000000'270000'

To address your last comment that's something I don't know how to express it in a working query, except for the boneheaded version which would be me manually adding every possible number within my desired range with +1bit after that number. As you can imagine the query would look terrible. That's why I'm looking for a smarter solution that I cannot figure out how to do so by my self.

select * from @t 
where (CONVERT(varchar(max),val,2) like '%262100%' or 
CONVERT(varchar(max),attr,2) like '%262200%' or 
etc...)

Solution

  • DECLARE @YourTable table
    (
    Id INT PRIMARY KEY,
    Val VARBINARY(50)
    )
    
    INSERT @YourTable
    VALUES (1, 0x0329000414000B14000C14000D0F00177800224600467800473C00550F00000000000000000000000000),
           (2, 0x0329002637000B14000C14000D0F00177800224600467800473C00550F00000000000000000000000000);
    
    SELECT Id, Triplet
    FROM @YourTable T
    CROSS APPLY GENERATE_SERIES(1,DATALENGTH(T.Val),3) s
    CROSS APPLY (VALUES (SUBSTRING(T.Val, s.value, 3))) V(Triplet)
    WHERE Triplet BETWEEN 0x263700 AND 0x2637FF
    

    This works only with '22 sql server because of 'generate_series'

    DECLARE @YourTable table
    (
    Id INT PRIMARY KEY,
    Val VARBINARY(50)
    )
    
    INSERT @YourTable
    VALUES (1, 0x0329000414000B14000C14000D0F00177800224600467800473C00550F00000000000000000000000000),
           (2, 0x0329002637000B14000C14000D0F00177800224600467800473C00550F00000000000000000000000000);
    
    
    SELECT Id, Triplet
    FROM @YourTable T
    JOIN (VALUES (1),(4),(7),(10),(13),(16),(19),(22),(25),(28),(31),(34),(37),(40),(43),(46),(49)) Nums(Num) ON Num <= DATALENGTH(T.Val)
    CROSS APPLY (VALUES (SUBSTRING(T.Val, Num, 3))) V(Triplet)
    WHERE Triplet BETWEEN 0x263700 AND 0x2637FF
    

    This one works on older versions without "generate_series"

    The credit is to @Martin Smith from stackexchange https://dba.stackexchange.com/questions/323235/varbinary-pattern-search