Search code examples
mysqlalgorithmsearchnested-sets

Searching a nested set


I've got a MySQL table that acts like a nested set in order to contain a hierarchy of categories. The table schema looks like:

CREATE TABLE IF NOT EXISTS `categories` (
  `id` int(11) NOT NULL auto_increment,
  `name` varchar(200) NOT NULL,
  `parent_id` int(11) default NULL,
  `lft` int(11) default NULL,
  `rgt` int(11) default NULL,
  PRIMARY KEY  (`id`),
  UNIQUE KEY `index_categories_on_parent_id_and_name` (`parent_id`,`name`)
)

lft and rgt define the left and right boundaries of a node (the way a nested set works is that each node's id falls within its parent's boundaries), and parent_id specifies the parent node. The unique index allows there to be multiple categories with the same name, as long as they don't have the same parent.

I am trying to figure out a proper way to find a specific node in the set, based on hierarchy. For instance, if I look for foo/bar/baz, I want to retrieve the node named baz, whose parent is named bar, who's parent is named foo. Obviously, I can't just search by name, because there could be multiple categories with the same name.

The way I can think of doing this is to find the topmost category, then find each subsequent category with the given name whose parent id is that of the previously found category, but this doesn't seem very efficient to me. Is there a better way to search a nested set?


Solution

  • I don't believe there's a perfectly clean and efficient way to do this with nested sets. Storing a list of the node's ancestors in a denormalized column would provide this efficiently, but I don't suggest implementing it.

    There is an ok'ish method though, which is 1 query and will conveniently hit the index you already have. You're looking at one join for each level of depth of the target node.

    For your example foo-bar-baz

    select c3.*
    from categories c1
    inner join categories c2 on c2.parent_id = c1.id AND c2.name = 'bar'
    inner join categories c3 on c3.parent_id = c2.id AND c2.name = 'baz'
    where c1.name = 'foo'

    It's not the greatest, but it's probably your best bet unless you want to go to the effort of storing a bunch of denormalized information. It's fairly straight forward to generate the SQL in code as well.