Search code examples
sqlitesubquerylimit

SQLite: how to LIMIT a subquery for GROUP_CONCAT


I have a subquery that feeds a GROUP_CONCAT. All works well except the LIMIT, which does not have the effect I want.

Here is the code:

CREATE TABLE IF NOT EXISTS Animal (
    id  TEXT    PRIMARY KEY,
    diet    TEXT
    );
INSERT INTO Animal VALUES ('monkey',    'herbivore');
INSERT INTO Animal VALUES ('goat',  'herbivore');
INSERT INTO Animal VALUES ('cat',   'carnivore');
INSERT INTO Animal VALUES ('dog',   'omnivore');
INSERT INTO Animal VALUES ('human', 'omnivore');

CREATE TABLE IF NOT EXISTS Domicile (
    id  TEXT    PRIMARY KEY REFERENCES Animal,
    home    TEXT
    );

INSERT INTO Domicile VALUES ('monkey',  'jungle');
INSERT INTO Domicile VALUES ('goat',    'farmyard');
INSERT INTO Domicile VALUES ('cat', 'house');
INSERT INTO Domicile VALUES ('dog', 'garden');
INSERT INTO Domicile VALUES ('human',   'house');

CREATE TABLE IF NOT EXISTS Food (
    name    TEXT    PRIMARY KEY
    );

INSERT INTO Food VALUES ('fruit');
INSERT INTO Food VALUES ('meat');
INSERT INTO Food VALUES ('fish');
INSERT INTO Food VALUES ('bread');
INSERT INTO Food VALUES ('sausages');

CREATE TABLE IF NOT EXISTS Eats (
    id  TEXT    REFERENCES Animal,
    name    TEXT    REFERENCES Food
    );

INSERT INTO Eats VALUES ('monkey',  'fruit');
INSERT INTO Eats VALUES ('monkey',  'bread');
INSERT INTO Eats VALUES ('goat',    'fruit');
INSERT INTO Eats VALUES ('cat',     'meat');
INSERT INTO Eats VALUES ('cat',     'sausages');
INSERT INTO Eats VALUES ('dog',     'meat');
INSERT INTO Eats VALUES ('dog',     'bread');
INSERT INTO Eats VALUES ('dog',     'sausages');
INSERT INTO Eats VALUES ('human',   'fruit');
INSERT INTO Eats VALUES ('human',   'meat');
INSERT INTO Eats VALUES ('human',   'fish');
INSERT INTO Eats VALUES ('human',   'bread');
INSERT INTO Eats VALUES ('human',   'sausages');

.mode column

SELECT 'TEST 4: JOIN Eats using a subquery to order the names.';
SELECT      Animal.id,
        Animal.diet,
        GROUP_CONCAT (Eats.name) as names,
        Domicile.home
FROM        Animal
        INNER JOIN Domicile USING (id)
        INNER JOIN (
            SELECT * FROM Eats
            ORDER BY name
            ) as Eats USING (id)
GROUP BY    Animal.id
ORDER BY    Animal.id
;
SELECT 'TEST 4 result: names in order, but not limited to 3.';

SELECT 'TEST 5: LIMIT Eats subquery to 3.';
SELECT      Animal.id,
        Animal.diet,
        GROUP_CONCAT (Eats.name) as names,
        Domicile.home
FROM        Animal
        INNER JOIN Domicile USING (id)
        INNER JOIN (
            SELECT * FROM Eats
            ORDER BY name
            LIMIT 3
            ) as Eats USING (id)
GROUP BY    Animal.id
ORDER BY    Animal.id
;
SELECT 'TEST 5 result: LIMIT has applied to wrong SELECT?.';

What I want is output like this:

id      diet       names                home
------  ---------  -------------------  --------
cat     carnivore  meat,sausages        house
dog     omnivore   bread,meat,sausages  garden
goat    herbivore  fruit                farmyard
human   omnivore   bread,fish,fruit     house
monkey  herbivore  bread,fruit          jungle

Notes:

  1. Rows ordered by 'id'.
  2. Not every possible combination is present in Eats. (eg goat only eats fruit).
  3. 'names' column in alphabetical order and limited to 3 elements (else 'human' would be bread,fish,fruit,meat,sausages).

What I get from the two tests is this:

'TEST 4: JOIN Eats using a subquery to order the names.'
--------------------------------------------------------
TEST 4: JOIN Eats using a subquery to order the names.  
id      diet       names                           home    
------  ---------  ------------------------------  --------
cat     carnivore  meat,sausages                   house   
dog     omnivore   bread,meat,sausages             garden  
goat    herbivore  fruit                           farmyard
human   omnivore   bread,fish,fruit,meat,sausages  house   
monkey  herbivore  bread,fruit                     jungle  
'TEST 4 result: names in order, but not limited to 3.'
------------------------------------------------------
TEST 4 result: names in order, but not limited to 3.  
'TEST 5: LIMIT Eats subquery to 3.'
-----------------------------------
TEST 5: LIMIT Eats subquery to 3.  
id      diet       names  home  
------  ---------  -----  ------
dog     omnivore   bread  garden
human   omnivore   bread  house 
monkey  herbivore  bread  jungle
'TEST 5 result: LIMIT has applied to wrong SELECT?.'
----------------------------------------------------
TEST 5 result: LIMIT has applied to wrong SELECT?.  

The only difference between the source of tests 4 and 5 is the LIMIT in the test 5 subquery.

Is there a way to limit the names to, say, 3 elements? (I would want the limit to be easily changeable.)


Solution

  • I believe the following is a solution (that is hopefully relatively easy to understand):-

    WITH
        cte1 AS (
            SELECT 
                animal.*,
                domicile.home,
                eats.name 
            FROM animal 
            JOIN eats ON eats.id = animal.id 
            JOIN domicile ON domicile.id = animal.id
            ORDER BY animal.id,eats.id
            ),
        cte2 AS (
            SELECT *, (
                SELECT count(*) FROM cte1 AS cte1_c 
                WHERE cte1_c.name <= cte1.name AND cte1_c.id = cte1.id ORDER BY id,name
            ) AS c 
            FROM cte1
        )
    SELECT id,diet,group_concat(name) AS names,home FROM cte2 WHERE c <= 3 GROUP BY id;
    
    • basically the core data (cte1) has the number of eats that are less or the same appended thus allowing the final WHERE clause to limit the number of eats per group

    Which results in:-

    enter image description here

    How many names is easily changed by altering the number in the final WHERE clause WHERE c <= 3

    e.g. amending the data where your first select produces:-

    enter image description here

    • i.e. human has 9 foods/names

    Using the solution but with WHERE c <= 5 then:-

    enter image description here

    Using 1 results in:-

    enter image description here

    The flaw is if using less than 1 in which case no data will be output.

    Additional

    This is very much the same as the answer but amended to:-

    1. hopefully make it easier to understand (comments and name of the subquery for the count changed), and
    2. to correctly order the food names for the group_concat

    :-

    WITH
        cte1 AS (
            SELECT 
                animal.*,
                domicile.home,
                eats.name 
            FROM animal 
            JOIN eats ON eats.id = animal.id 
            JOIN domicile ON domicile.id = animal.id
            ORDER BY animal.id,eats.id
            ),
        cte2 AS (
            SELECT 
                *, 
                /* Use a subquery to ascertain the number (count) of names in cte1 that 
                    a) have the the same name as the name being processed, OR 
                    c) a name that is less than the current name being processed, AND
                    b) have the same id as the currrent id being processed 
                */
                (
                    SELECT 
                        count(*) 
                        FROM cte1 AS query_of_all_of_cte1
                    WHERE 
                        query_of_all_of_cte1.name <= cte1.name /* cte1.name is the name from the current row being processed */ 
                    AND 
                        query_of_all_of_cte1.id = cte1.id /* cte1.id is the id from the current row being processed */
                
                ) 
                /* the column name rather than defaulting to count(*) is aliaseed as c */
                    AS c 
            FROM cte1 ORDER BY id,name
        )
    SELECT id,diet,group_concat(name) AS names,home FROM cte2 WHERE c <= 3 GROUP BY id;