I have two tables in database Products and Filters.
The schema:
I have created a query that find all the records from the filters table, loop with each record and call a procedure that set the category id for Products table.
Filter table data will be as follow.
The filter selection query is as follow..
DECLARE @TotalRecords INT, @Start INT, @Limit INT, @CatId INT, @Merchants NVARCHAR(max), @NotMatch NVARCHAR(max), @WillMatch NVARCHAR(max);
SELECT @TotalRecords = COUNT(*) FROM filters;
SET @Limit = 1;
SET @Start = 0;
WHILE(@TotalRecords > 0)
BEGIN
SELECT @CatId = category_id, @Merchants = merchant_name, @NotMatch = not_match, @WillMatch = will_match FROM
(
SELECT TOP (@Start + @Limit) *, ROW_NUMBER() OVER (ORDER BY (SELECT 1)) AS rnum
FROM filters
) a
WHERE rnum > @Start;
-- call filter procedure.
exec procSetProductCategory @CatId = @CatId, @Merchants = @Merchants, @WillMatch = @WillMatch, @NotMatch = @NotMatch;
SET @Start += 1;
SET @TotalRecords -= 1;
END
And the procSetProductCategory as follow..
CREATE PROC [dbo].[procSetProductCategory]
(
@CatId INT = NULL,
@Merchants NVARCHAR(max),
@NotMatch NVARCHAR(max),
@WillMatch NVARCHAR(max)
)
AS
BEGIN
SET NOCOUNT ON
declare @query nvarchar(max), @orToken nvarchar(max), @andToken nvarchar(max);
set @query = 'UPDATE Products SET category_id = '+ convert(nvarchar(20), @CatId) + ' WHERE category_id IS NULL AND merchant_name IN(' + @Merchants + ')';
if(@WillMatch is not null AND LTRIM(RTRIM(@WillMatch)) != '')
BEGIN
set @andToken = '%'' AND product_name LIKE ''%';
set @WillMatch = REPLACE(@WillMatch, '+', @andToken);
set @orToken = '%'') OR (product_name LIKE ''%';
set @query = @query + ' AND ((product_name LIKE '''+ '%' + REPLACE(@WillMatch, ',', @orToken) + '%''))';
END
if(@NotMatch is not null AND LTRIM(RTRIM(@NotMatch)) != '')
BEGIN
set @andToken = '%'' AND product_name NOT LIKE ''%';
set @NotMatch = REPLACE(@NotMatch, '+', @andToken);
set @orToken = '%'') OR (product_name NOT LIKE ''%';
set @query = @query + ' AND ((product_name NOT LIKE '''+ '%' + REPLACE(@NotMatch, ',', @orToken) + '%''))';
END
EXECUTE sp_executesql @query;
END
It generates the sql query like following...
Query #1
-------------------------------------------------------------------------------------------------------
UPDATE Products SET category_id = 101 WHERE merchant_name IN('merchant 1','merchant 4','merchant 3') AND
(
(product_name LIKE '%abcd%' AND product_name LIKE '%efhg%')
) AND (
(product_name NOT LIKE '%3258%')
OR (product_name NOT LIKE '%yxzs%')
)
Query #2
-------------------------------------------------------------------------------------------------------
UPDATE Products SET category_id = 102 WHERE merchant_name IN('merchant 3', 'merchant 4') AND
(
(product_name LIKE '%1258%') OR (product_name LIKE '%abcd%')
)
Note there are some trick used here.
[,] is used to differentiate match phrases. [+] in match fields used for two match phrases with AND conditions.
These query doing the same what I needed..
Issue is that when I run this query with 500 000 products its using about 100% CPU.
How can we optimize the query that doesn't take impact on result but can reduce the CPU usage?
For starters, as pointed out already: there really is something wrong with the logic here. That said, assuming you are stuck with it there are some things you might want to try. My first question would be: how long does this thing run? You shouldn't worry too much that it takes 100% CPU; the question is how much time it takes to finish.
It seems that you are creating a loop over the the filters
table, fetching every single row, one by one.
CURSOR
and not the current approach.
SELECT 1
rnum
bigger than your counter=> This is wrong in so many ways, it actually hurts =(
SELECT 1
then it could return the records in the order ABCD the first time and BADC the second time; and both answers would be correct because you're sorting by a constant: the actual order of the records doesn't matter!rnum
values will fit the requirement of being greater than @start
; EVERY TIME!rnum > @start
, the returned record being used to fill up the records could be any one of them!To 'fix' this I'd suggest to use the following approach:
DECLARE @TotalRecords INT,
@Start INT,
@Limit INT,
@CatId INT,
@Merchants NVARCHAR(max),
@NotMatch NVARCHAR(max),
@WillMatch NVARCHAR(max);
DECLARE filter_loop CURSOR LOCAL FAST_FORWARD
FOR SELECT category_id,
merchant_name,
not_match,
will_match
FROM filters
ORDER BY id -- not required but makes debugging easier
OPEN filter_loop
FETCH NEXT FROM filter_loop INTO @CatId, @Merchants, @NotMatch, @WillMatch
WHILE @@FETCH_STATUS = 0
BEGIN
-- call filter procedure.
exec procSetProductCategory @CatId = @CatId, @Merchants = @Merchants, @WillMatch = @WillMatch, @NotMatch = @NotMatch;
-- get next filter
FETCH NEXT FROM filter_loop INTO @CatId, @Merchants, @NotMatch, @WillMatch
END
CLOSE filter_loop
DEALLOCATE filter_loop
At first sight there is very little I can do about the stored procedure itself. There is some dynamic sql string building that might be optimized a little bit but I very much doubt it will make much impact. As it is right now it's fairly readable, so I'd leave it as is. The generated query then indeed looks something like this:
UPDATE Products
SET category_id = 101
WHERE merchant_name IN ('merchant 1','merchant 4','merchant 3')
AND ((product_name LIKE '%abcd%' AND product_name LIKE '%efhg%') )
AND ((product_name NOT LIKE '%3258%') OR (product_name NOT LIKE '%yxzs%'))
for which I'd advice to create the following index:
CREATE INDEX idx_test ON Products (merchant_name) INCLUDE product_name)
Even with the changes above in place, this will still run for quite a while when working on 100k+ records. The only real solution around this would be to use a set-based approach, but would require either a gargantuan dynamic sql string; or some better knowledge about the data itself. E.g. You might try combining different Filters
records that have the same Merchants
value but different Match
/NoMatch
... Probably not too difficult, but I'd suggest to start with the suggestions above first and then see where you end up.