I have a script that checks for duplicate pairs in a database and selects all entries that need to be deleted except for one.
I have this script that selects the first 100 entries that need to be deleted and works fine:
SELECT*
FROM vl_posts_testing
INNER JOIN (
SELECT max(ID) AS lastId, `post_content`,`post_title`
FROM vl_posts_testing WHERE vl_posts_testing.post_type='post'
GROUP BY `post_content`,`post_title`
HAVING count(*) > 1) duplic
ON duplic.`post_content` = vl_posts_testing.`post_content`
AND duplic.`post_title` = vl_posts_testing.`post_title`
WHERE vl_posts_testing.id < duplic.lastId
AND vl_posts_testing.post_type='post'
LIMIT 0,100
However when I try to delete this set of data using:
DELETE vl_posts_testing
FROM vl_posts_testing
INNER JOIN (
SELECT max(ID) AS lastId, `post_content`,`post_title`
FROM vl_posts_testing WHERE vl_posts_testing.post_type='post'
GROUP BY `post_content`,`post_title`
HAVING count(*) > 1) duplic
ON duplic.`post_content` = vl_posts_testing.`post_content`
AND duplic.`post_title` = vl_posts_testing.`post_title`
WHERE vl_posts_testing.id < duplic.lastId
AND vl_posts_testing.post_type='post'
LIMIT 100
I receive the fallowing error:
You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'LIMIT 10' at line 8
The script has been constructed using this answer https://stackoverflow.com/a/6108860/1168944
Actually the script works just fine on a small amount of data without the limits set, however due to the fact that I run it on a big table (some 600k entries) I need to split this script in a routine that processes only a limited amount of data due to server limits like processor, memory etc.
Also took into consideration this example: MySQL LIMIT on DELETE statement but the result is different since no modification is executed no matter how small is the limit.
After several retries I have found a way to make it work:
DELETE vl_posts_testing
FROM vl_posts_testing
INNER JOIN (
SELECT max(ID) AS lastId, `post_content`,`post_title`
FROM vl_posts_testing WHERE vl_posts_testing.post_type='post'
GROUP BY `post_content`,`post_title`
HAVING count(*) > 1
LIMIT 0,100 ) duplic
ON duplic.`post_content` = vl_posts_testing.`post_content`
AND duplic.`post_title` = vl_posts_testing.`post_title`
WHERE vl_posts_testing.id < duplic.lastId
AND vl_posts_testing.post_type='post'
Actually what I did is set an inner limit to the first set of data and compare it to the rest of the database in order to make it work. It work but I am not sure this is the correct way ot do it.