Search code examples
mysqlgroup-byaggregatemysql-json

MySQL merging json arrays in group by


I'm trying to merge a scalar array json field within a group by to have all the distinct values in one list.

Consider the following table:

CREATE TABLE transaction
(
    id                 INT UNSIGNED AUTO_INCREMENT,
    source_account_id  VARCHAR(32)                                          NOT NULL,
    target_account_ids JSON                                                 NOT NULL,
    PRIMARY KEY (id)
) ENGINE = InnoDB CHARSET utf8mb4;

source_account_ids is a simple array of strings for example '["account1", "account2"]'. I'd like to gather all the target_account_ids of a single source to have a unified result. For example:

id source_account_id target_account_ids
1. account1 '["account1", "account2"]'
2. account1 '["account1", "account3"]'

And the desired result set would be:

source_account_id target_account_ids
account1 '["account1", "account2", "account3"]'

I tried to play around with JSON_ARRAYAGG but it just adds the arrays within another array and basically results in an "endless" array.


Solution

  • You have to explode the array with JSON_TABLE(), then reduce the values with DISTINCT, then you can recombine them with JSON_ARRAYAGG().

    select source_account_id, json_arrayagg(target_account_id) as target_account_ids
    from ( 
      select distinct source_account_id, j.account_id as target_account_id
      from transaction
      cross join json_table(target_account_ids, '$[*]' columns (account_id varchar(32) path '$')) as j
    ) as t
    group by source_account_id;
    

    GROUP_CONCAT() supports a DISTINCT keyword in its argument, but JSON_ARRAYAGG() doesn't (this feature has been requested: https://bugs.mysql.com/bug.php?id=91993).

    If this seems like a lot of needless work, or if you can't use JSON_TABLE() because you're still using MySQL 5.7, then you should store multi-valued attributes in normal rows and columns, instead of using JSON.