Search code examples
sql-serverdatabaset-sqlquery-optimization

Searching for a specific ID in a large database?


I need to look up an ID in a very large database. The ID is:

0167a901-e343-4745-963c-404809b74dd9

The database has hundreds of tables, and millions of rows in the big tables.

I can narrow the date to within the last 2 or 3 months, but that's about it. I'm looking for any clues as to how to narrow down searches like this.

One thing I'm curious about is whether using LIKE searches helps.

i.e does it help to do something like

select top 10 * 
from BIG_TABLE
where DESIRED_ID like '016%'

Any tips/suggestions are greatly appreciated . The database is being accessed remotely so that's part of the challenge


Solution

  • I have this script that I built several years ago for a similar purpose, albeit with text fields. It finds eligible columns, and then searches through those columns for the value. As you have a non-deterministic scope, you may not be able to do better than something like this.

    You may want to tweak it a bit to include uniqueidentifier columns - if that is actually the datatype - or use an equal instead of a like search.

    If this is something you are going to reuse periodically, you could feed it a list of common tables or columns to find these things in, so it doesn't take as long to find things.

    /*This script will find any text value in the database*/
    /*Output will be directed to the Messages window. Don't forget to look there!!!*/
    
    SET NOCOUNT ON
    DECLARE @valuetosearchfor varchar(128), @objectOwner varchar(64)
    SET @valuetosearchfor = '%putYourGuidHere%' --should be formatted as a like search 
    SET @objectOwner = 'dbo'
    
    DECLARE @potentialcolumns TABLE (id int IDENTITY, sql varchar(4000))
    
    INSERT INTO @potentialcolumns (sql)
    SELECT 
        ('if exists (select 1 from [' +
        [tabs].[table_schema] + '].[' +
        [tabs].[table_name] + 
        '] (NOLOCK) where [' + 
        [cols].[column_name] + 
        '] like ''' + @valuetosearchfor + ''' ) print ''SELECT * FROM [' +
        [tabs].[table_schema] + '].[' +
        [tabs].[table_name] + 
        '] (NOLOCK) WHERE [' + 
        [cols].[column_name] + 
        '] LIKE ''''' + @valuetosearchfor + '''''' +
        '''') as 'sql'
    FROM information_schema.columns cols
        INNER JOIN information_schema.tables tabs
            ON cols.TABLE_CATALOG = tabs.TABLE_CATALOG
                AND cols.TABLE_SCHEMA = tabs.TABLE_SCHEMA
                AND cols.TABLE_NAME = tabs.TABLE_NAME
    WHERE cols.data_type IN ('char', 'varchar', 'nvchar', 'nvarchar','text','ntext')
        AND tabs.table_schema = @objectOwner
        AND tabs.TABLE_TYPE = 'BASE TABLE'
        AND (cols.CHARACTER_MAXIMUM_LENGTH >= (LEN(@valueToSearchFor) - 2) OR cols.CHARACTER_MAXIMUM_LENGTH = -1)
    ORDER BY tabs.table_catalog, tabs.table_name, cols.ordinal_position
    
    DECLARE @count int
    SET @count = (SELECT MAX(id) FROM @potentialcolumns)
    PRINT 'Found ' + CAST(@count as varchar) + ' potential columns.'
    PRINT 'Beginning scan...'
    PRINT ''
    PRINT 'These columns contain the values being searched for...'
    PRINT ''
    DECLARE @iterator int, @sql varchar(4000)
    SET @iterator = 1
    WHILE @iterator <= (SELECT Max(id) FROM @potentialcolumns)
    BEGIN
        SET @sql = (SELECT [sql] FROM @potentialcolumns where [id] = @iterator)
        IF (@sql IS NOT NULL) and (RTRIM(LTRIM(@sql)) <> '')
        BEGIN
            --SELECT @sql --use when checking sql output
            EXEC (@sql)
        END
        SET @iterator = @iterator + 1
    END
    
    PRINT ''
    PRINT 'Scan completed'
    

    If that looks wonky, the script is executing a statement like this

    if exists (select 1 from [schema].[table_name] (NOLOCK) 
                        where [column_name] LIKE '%yourValue%')
    begin
       print select * from [schema].[table_name] (NOLOCK) 
                        where [column_name] LIKE '%yourValue%'
    end
    

    ...and just replacing the [schema], [table_name], [column_name] and %yourValue% in a loop.

    It's filtering on...

    • tables in a specific schema (filter can be removed)
    • only tables, not views (can be adjusted)
    • only columns that will hold the search value
    • the (n)char/(n)varchar/(n)text data types (add or change, be cognizant of data type conversion)

    Lastly, output does not go to the results grid. Check the Messages window (where you see "N rows affected")