Search code examples
mysqlescapinggroup-concat

MySQL GROUP_CONCAT escaping


(NOTE: This question is not about escaping queries, it's about escaping results)

I'm using GROUP_CONCAT to combine multiple rows into a comma delimited list. For example, assume I have the two (example) tables:

CREATE TABLE IF NOT EXISTS `Comment` (
`id` int(11) unsigned NOT NULL auto_increment,
`post_id` int(11) unsigned NOT NULL,
`name` varchar(255) collate utf8_unicode_ci NOT NULL,
`comment` varchar(255) collate utf8_unicode_ci NOT NULL,
PRIMARY KEY  (`id`),
KEY `post_id` (`post_id`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=6 ;

INSERT INTO `Comment` (`id`, `post_id`, `name`, `comment`) VALUES
(1, 1, 'bill', 'some comment'),
(2, 1, 'john', 'another comment'),
(3, 2, 'bill', 'blah'),
(4, 3, 'john', 'asdf'),
(5, 4, 'x', 'asdf');


CREATE TABLE IF NOT EXISTS `Post` (
`id` int(11) NOT NULL auto_increment,
`title` varchar(255) collate utf8_unicode_ci NOT NULL,
PRIMARY KEY  (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=7 ;

INSERT INTO `Post` (`id`, `title`) VALUES
(1, 'first post'),
(2, 'second post'),
(3, 'third post'),
(4, 'fourth post'),
(5, 'fifth post'),
(6, 'sixth post');

And I want to list all posts along with a list of each username who commented on the post:

SELECT
Post.id as post_id, Post.title as title, GROUP_CONCAT(name) 
FROM Post 
LEFT JOIN Comment on Comment.post_id = Post.id
GROUP BY Post.id

gives me:

id  title   GROUP_CONCAT( name )
1   first post  bill,john
2   second post     bill
3   third post  john
4   fourth post     x
5   fifth post  NULL
6   sixth post  NULL

This works great, except that if a username contains a comma it will ruin the list of users. Does MySQL have a function that will let me escape these characters? (Please assume usernames can contain any characters, since this is only an example schema)


Solution

  • If there's some other character that's illegal in usernames, you can specify a different separator character using a little-known syntax:

    ...GROUP_CONCAT(name SEPARATOR '|')...
    

    ... You want to allow pipes? or any character?

    Escape the separator character, perhaps with backslash, but before doing that escape backslashes themselves:

    group_concat(replace(replace(name, '\\', '\\\\'), '|', '\\|') SEPARATOR '|')
    

    This will:

    1. escape any backslashes with another backslash
    2. escape the separator character with a backslash
    3. concatenate the results with the separator character

    To get the unescaped results, do the same thing in the reverse order:

    1. split the results by the separator character where not preceded by a backslash. Actually, it's a little tricky, you want to split it where it isn't preceded by an odd number of blackslashes. This regex will match that:
      (?<!\\)(?:\\\\)*\|
    2. replace all escaped separator chars with literals, i.e. replace \| with |
    3. replace all double backslashes with singe backslashes, e.g. replace \\ with \