Search code examples
postgresqlpostgresql-9.3

How to find best grouping match combination on postgresql query


I would like to see if it is possible to solve my problem with a more elegant query than the one I did. Maybe with window analytical functions or something else that I don't know of.

I have tree tables tablea, tableb and relation with the below structure:

tablea               tableb                 relation
------------------   -------------------    ----------------
ida integer PK,      idb integer PK,        id_b integer FK
namea varchar(10)    nameb varchar(10)      id_a integer FK

The idea is that the tableb is the one with the groups and the relation table is just the m-n relationship between tablea and tableb

So I have the following data (based on the n-m relation table)

group1 ( namea1, namea2, namea3 );
group2 ( namea1, namea5, namea6 );
group3 ( namea1, namea6 );
group4 ( namea6, namea7 );
group5 ( namea7, namea8, namea9 );
group6 ( namea1, namea2, namea5, namea6 );

And the user want to create a new group and add names to it (he can only add names that exists on tablea). My goal is to show to this user the closest related groups with the names that he want to add on this new group

So lets say that the user is creating the group7 and want to add 'namea1', 'namea6' and 'namea4' on it. Then I woud show him the groups that already have this combination of words or at least some of then on it. I make this query:

select b.nameb, al.names, array_agg(a.namea), count(a.namea) qty
from
   tablea a 
      INNER JOIN relation g on a.ida = g.id_a
      INNER JOIN tableb b on b.idb = g.id_b
      INNER JOIN 
       (select b.idb, b.nameb, array_agg(a.namea) as names
          from tablea a 
               INNER JOIN relation g on a.ida = g.id_a
               INNER JOIN tableb b on b.idb = g.id_b
         group by b.idb, b.nameb) al ON g.id_b = al.idb
where a.namea in ( 'namea1', 'namea6', 'namea4' )
group by b.nameb, al.names
order by b.nameb;

Which brings me:

Groups     All names on it                 which one he want to add     occur. count
-------------------------------------------------------------------------------------
group1    (namea1,namea2,namea3)           namea1                       1
group2    (namea1,namea5,namea6)           namea1,namea6                2
group3    (namea1,namea6)                  namea1,namea6                2
group4    (namea6,namea7)                  namea6                       1
group6    (namea1,namea2,namea5,namea6)    namea1,namea6                2

Showing him this info will help to decide whether he really needs to create this new group or just change the ones that already exists adding the name that isn't on it (on my example namea4 in group 2, 3 or 6).

I've created a fiddle example with all this data and with my query. It is here: http://sqlfiddle.com/#!15/8a63b/3

The query I did solve my problem, what I don't like on it is the fact that I had to repeat the same query to use it as subquery to show all names and which one the user provided is inside it.

Thanks in advance.


Solution

  • This should be simple, SQLFIDDLE

    SELECT
      tableb.nameb AS group_name,
      string_agg( tablea.namea, ', ') as users_in_group,
      string_agg( tablea2.namea, ', ') as want_to_add,
      count(tablea2.ida) as count
    FROM
      relation
      JOIN tablea on ( ida = id_a )
      JOIN tableb ON ( idb = id_b )
      LEFT JOIN tablea as tablea2 ON ( tablea2.namea in ('namea1', 'namea6', 'namea4') 
                                      AND tablea2.ida = tablea.ida )
    GROUP BY 
      tableb.idb,
      tableb.nameb
    HAVING 
       count(tablea2.ida) > 0
    ORDER BY 
       tableb.nameb
    

    Should get you,

    | GROUP_NAME |                 USERS_IN_GROUP |    WANT_TO_ADD | COUNT |
    |------------|--------------------------------|----------------|-------|
    |     group1 |         namea1, namea2, namea3 |         namea1 |     1 |
    |     group2 |         namea1, namea5, namea6 | namea1, namea6 |     2 |
    |     group3 |                 namea1, namea6 | namea1, namea6 |     2 |
    |     group4 |                 namea6, namea7 |         namea6 |     1 |
    |     group6 | namea1, namea2, namea5, namea6 | namea1, namea6 |     2 |