I'm working on an application where users can tag "components" as part of the workflow. In many cases, they end up with several tags that are synonyms of each other. They would like these to be grouped together so that when one tag is added to a component, the rest of the tags in the group can be added as well.
I decided to break up tag groups into two-way relationships between each pair of tags in the group. So if a group has tags 1 and 2, there's a record that looks like this:
ID TagID RelatedTagID
1 1 2
2 2 1
Basically, a group is represented as a Cartesian product of each tag in it. Extend that to 3 tags:
ID Name
1 MM
2 Managed Maintenance
3 MSP
Our relationships look like this:
ID TagID RelatedTagID
1 1 2
2 2 1
3 1 3
4 3 1
5 2 3
6 3 2
I have a couple methods to group them together, but they're less than stellar. First, I wrote a view that lists each tag along with the list of tags in its group:
SELECT
TagKey AS ID,
STUFF
((SELECT ',' + cast(RelatedTagKey AS nvarchar)
FROM RelatedTags rt
WHERE rt.TagKey = t.TagKey
FOR XML PATH('')), 1, 1, '') AS RelatedTagKeys
FROM (
SELECT DISTINCT TagKey
FROM RelatedTags
) t
The problem with this is that each group appears in the results as many times as there are tags in it, which I wasn't able to think of a way to work around in a single query. So it gives me back:
ID RelatedTagKeys
1 2,3
2 1,3
3 1,2
Then in my back-end, I discard all groups that contain a key that occurs in another group. Tags aren't being added to multiple groups, so that works, but I don't like how much extraneous data I'm pulling down.
The second solution I came up with was this LINQ query. The key used to group the tags is a listing of the group itself. This is probably much worse than I originally thought.
from t in Tags.ToList()
where t.RelatedTags.Any()
group t by
string.Join(",", (new List<int> { t.ID })
.Concat(t.RelatedTags.Select(i => i.Tag.ID))
.OrderBy(i => i))
into g
select g.ToList()
I really hate grouping by the result of calling string.Join
, but when I tried just grouping by the list of keys, it didn't group properly, putting each tag in a group by itself. Also, the SQL it generated is monstrous. I'm not going to paste it here, but LINQPad shows that it generates about 12,000 lines of individual SELECT statements on my test database (we have 1562 tags and 67 records in RelatedTags).
These solutions work, but they're pretty naive and inefficient. I don't know where else to go with this, though. Any ideas?
I suppose working with your data gets easier if you have a groupId
for each of your tags, such that tags that are related share the same value of groupId
.
To explain what I mean, I added a second set of related tags to your dataset:
INSERT INTO tags ([ID], [Name]) VALUES
(1, 'MM'),
(2, 'Managed Maintenance'),
(3, 'MSP'),
(4, 'UM'),
(5, 'Unmanaged Maintenance');
and
INSERT INTO relatedTags ([ID], [TagID], [RelatedTagID]) VALUES
(1, 1, 2),
(2, 2, 1),
(3, 1, 3),
(4, 3, 1),
(5, 2, 3),
(6, 3, 2),
(7, 4, 5),
(8, 5, 4);
Then, a table holding the following information should make a lot of other things easier (I first explain the content of the table and then how to get it using a query):
tagId | groupId
------|--------
1 | 1
2 | 1
3 | 1
4 | 4
5 | 4
The data comprises two groups of related tags, i.e. {1,2,3}
and {4,5}
. Therefore, above table marks tags belonging to the same group with the same groupId
, i.e. 1
for {1,2,3}
, and 4
for {4,5}
.
To achieve such a view/table, you could use the following query:
with rt as
( (select r2.tagId, r2.relatedTagId
from relatedTags r1 join relatedTags r2 on r1.tagId = r2.relatedTagId)
union
(select r3.tagId, r3.tagId as relatedTagId from relatedTags r3)
)
select rt.tagId, min(rt.relatedTagId) as groupId from rt
group by tagId
Of course, instead of introducing a new table / view, you could also extend your primary tags
-table by a groupId
attribute.
Hope this helps.