Dividing up a table of two-way relationships into distinct groups

I'm working on an application where users can tag "components" as part of the workflow. In many cases, they end up with several tags that are synonyms of each other. They would like these to be grouped together so that when one tag is added to a component, the rest of the tags in the group can be added as well.

I decided to break up tag groups into two-way relationships between each pair of tags in the group. So if a group has tags 1 and 2, there's a record that looks like this:

ID     TagID    RelatedTagID
1      1        2
2      2        1

Basically, a group is represented as a Cartesian product of each tag in it. Extend that to 3 tags:

ID    Name
1     MM
2     Managed Maintenance
3     MSP

Our relationships look like this:

ID    TagID    RelatedTagID
1     1        2
2     2        1
3     1        3
4     3        1
5     2        3
6     3        2

I have a couple methods to group them together, but they're less than stellar. First, I wrote a view that lists each tag along with the list of tags in its group:

SELECT
    TagKey AS ID,
    STUFF
        ((SELECT ',' + cast(RelatedTagKey AS nvarchar)
          FROM RelatedTags rt
          WHERE rt.TagKey = t.TagKey
          FOR XML PATH('')), 1, 1, '') AS RelatedTagKeys
FROM (
    SELECT DISTINCT TagKey
    FROM RelatedTags
) t

The problem with this is that each group appears in the results as many times as there are tags in it, which I wasn't able to think of a way to work around in a single query. So it gives me back:

ID    RelatedTagKeys
1     2,3
2     1,3
3     1,2

Then in my back-end, I discard all groups that contain a key that occurs in another group. Tags aren't being added to multiple groups, so that works, but I don't like how much extraneous data I'm pulling down.

The second solution I came up with was this LINQ query. The key used to group the tags is a listing of the group itself. This is probably much worse than I originally thought.

from t in Tags.ToList()
where t.RelatedTags.Any()
group t by 
    string.Join(",", (new List<int> { t.ID })
        .Concat(t.RelatedTags.Select(i => i.Tag.ID))
        .OrderBy(i => i))
into g
select g.ToList()

I really hate grouping by the result of calling string.Join, but when I tried just grouping by the list of keys, it didn't group properly, putting each tag in a group by itself. Also, the SQL it generated is monstrous. I'm not going to paste it here, but LINQPad shows that it generates about 12,000 lines of individual SELECT statements on my test database (we have 1562 tags and 67 records in RelatedTags).

These solutions work, but they're pretty naive and inefficient. I don't know where else to go with this, though. Any ideas?

Solution

I suppose working with your data gets easier if you have a groupId for each of your tags, such that tags that are related share the same value of groupId. To explain what I mean, I added a second set of related tags to your dataset:

INSERT INTO tags ([ID], [Name]) VALUES
    (1, 'MM'),
    (2, 'Managed Maintenance'),
    (3, 'MSP'),
    (4, 'UM'),
    (5, 'Unmanaged Maintenance');

and

INSERT INTO relatedTags ([ID], [TagID], [RelatedTagID]) VALUES
    (1, 1, 2),
    (2, 2, 1),
    (3, 1, 3),
    (4, 3, 1),
    (5, 2, 3),
    (6, 3, 2),
    (7, 4, 5),
    (8, 5, 4);

Then, a table holding the following information should make a lot of other things easier (I first explain the content of the table and then how to get it using a query):

tagId | groupId
------|-------- 
1     | 1
2     | 1
3     | 1
4     | 4
5     | 4

The data comprises two groups of related tags, i.e. {1,2,3} and {4,5}. Therefore, above table marks tags belonging to the same group with the same groupId, i.e. 1 for {1,2,3}, and 4 for {4,5}.

To achieve such a view/table, you could use the following query:

with rt as
( (select r2.tagId, r2.relatedTagId
   from relatedTags r1 join relatedTags r2 on r1.tagId = r2.relatedTagId)
 union 
  (select r3.tagId, r3.tagId as relatedTagId from relatedTags r3)
)
select rt.tagId, min(rt.relatedTagId) as groupId from rt
group by tagId

Of course, instead of introducing a new table / view, you could also extend your primary tags-table by a groupId attribute.

Hope this helps.