Search code examples
mysqlsqlconcatenationgroup-concat

MYSQL - Combine rows with multiple duplicate values and delete duplicates afterwards


So I have my database set up as a single table. In that table I have collected source URL and description (I am scraping the product description from a number of pages). Unfortunately I have ended up with multiple rows in the database for a URL/source page if there is more than one paragraph.

What I would like to do is, if there are multiple rows with the same URL, combine the description from each row and then delete the duplicate rows for that URL.

My table is literally structured like so:

table             
+----+----------------------------+-------------+
| id | url                        | description |
+----+----------------------------+-------------+
|  1 | http://example.com/page-a  | paragraph 1 |
|  2 | http://example.com/page-a  | paragraph 2 |
|  3 | http://example.com/page-a  | paragraph 3 |
|  4 | http://example.com/page-b  | paragraph 1 |
|  5 | http://example.com/page-b  | paragraph 2 |
+----+----------------------------+-------------+

How I want it is like:

table             
+----+----------------------------+-------------------------------------+
| id | url                        | description                         |
+----+----------------------------+-------------------------------------+
|  1 | http://example.com/page-a  | paragraph 1 paragraph 2 paragraph 3 |
|  2 | http://example.com/page-b  | paragraph 1 paragraph 2             |
+----+----------------------------+-------------------------------------+

I'm not so bothered about the IDs being updated to be correct, I just want to be able to combine the rows where the paragraphs should be in the same field as they are the same URLs, and then delete the duplicates.

Any help would be greatly appreciated!


Solution

  • It's easy to filter the table, just insert the result in a new table:

    SELECT url, GROUP_CONCAT(description ORDER BY description SEPARATOR ' ') AS description
    FROM `table`
    GROUP BY url