Search code examples
sqlpostgresqlmaterialized-path-pattern

Sorting tree with a materialized path?


I have a tree structure in a table and it uses materialized paths to allow me to find children quickly. However, I also need to sort the results depth-first, as one would expect with threaded forum replies.

 id | parent_id | matpath |          created           
----+-----------+---------+----------------------------
  2 |         1 | 1       | 2010-05-08 15:18:37.987544
  3 |         1 | 1       | 2010-05-08 17:38:14.125377
  4 |         1 | 1       | 2010-05-08 17:38:57.26743
  5 |         1 | 1       | 2010-05-08 17:43:28.211708
  7 |         1 | 1       | 2010-05-08 18:18:11.849735
  6 |         2 | 1.2     | 2010-05-08 17:50:43.288759
  9 |         5 | 1.5     | 2010-05-09 14:02:43.818646
  8 |         6 | 1.2.6   | 2010-05-09 14:01:17.632695

So the final results should actually be sorted like this:

 id | parent_id | matpath |          created
----+-----------+---------+----------------------------
  2 |         1 | 1       | 2010-05-08 15:18:37.987544
  6 |         2 | 1.2     | 2010-05-08 17:50:43.288759
  8 |         6 | 1.2.6   | 2010-05-09 14:01:17.632695
  3 |         1 | 1       | 2010-05-08 17:38:14.125377
  4 |         1 | 1       | 2010-05-08 17:38:57.26743
  5 |         1 | 1       | 2010-05-08 17:43:28.211708
  9 |         5 | 1.5     | 2010-05-09 14:02:43.818646
  7 |         1 | 1       | 2010-05-08 18:18:11.849735

How would I work that out? Can I do that in straight SQL (this is PostgreSQL 8.4) or should additional information be added to this table?

Update: trying to explain sort criteria better.

Imagine that id '1' is the root post to a forum and everything with a 'matpath' beginning with '1' is a child of that post. So ids 2 through 5 are direct replies to 1 and get matpaths of '1'. However, id 6 is a reply 2, not directly to 1, so it gets a matpath of 1.2. This means that for a threaded forum with proper nesting, with all ids shown in the tables, the structure of the forum would look like this, hence the ordering requirement:

* id 1 (root post)
    * id 2
        * id 6
            * id 8
    * id 3
    * id 4
    * id 5
        * id 9
    * id 7

Solution

  • I typically create an additional columnn for this, called something like SortPath. It would contain the data that you need to sort by, concatenated together. That column would be of type varchar, and get sorted as a string. Something like this:

    id | parent_id | matpath |          created            |                   sortpath
    ---+-----------+---------+-----------------------------+--------------------------------------------------------------------------------------
     2 |         1 | 1       | 2010-05-08 15:18:37.987544  | 2010-05-08 15:18:37.987544-2
     6 |         2 | 1.2     | 2010-05-08 17:50:43.288759  | 2010-05-08 15:18:37.987544-2.2010-05-08 17:50:43.288759-6
     8 |         6 | 1.2.6   | 2010-05-09 14:01:17.632695  | 2010-05-08 15:18:37.987544-2.2010-05-08 17:50:43.288759-6.2010-05-09 14:01:17.632695-8
     3 |         1 | 1       | 2010-05-08 17:38:14.125377  | 2010-05-08 17:38:14.125377-3
     4 |         1 | 1       | 2010-05-08 17:38:57.26743   | 2010-05-08 17:38:57.267430-4 
     5 |         1 | 1       | 2010-05-08 17:43:28.211708  | 2010-05-08 17:43:28.211708-5
     9 |         5 | 1.5     | 2010-05-09 14:02:43.818646  | 2010-05-08 17:43:28.211708-5.2010-05-09 14:02:43.818646-9
     7 |         1 | 1       | 2010-05-08 18:18:11.849735  | 2010-05-08 18:18:11.849735-7
    

    A couple of things to note here:

    • sortpath will be sorted as a string, so it is important all dates have the same length for it to correctly sort. E.g., observe how 2010-05-08 17:38:57.26743 has an extra zero added in the sortpath column.
    • I have appended the PK of each node to the end of its date. This is so that if you happen to have two rows with the exact same date, they will always get returned in the same order due to the additional data we are appending.
    • To me, the data looks asymmetrical the way I have written it, because we are showing the current node's date in sortpath, but it is not in matpath. I would prefer to see it in both.
    • You may want to put the date of node ID 1 at the beginning of each sortcolumn as well. This is so that if you ever want to query for more than one forum at a time (you probably won't), then it will still sort correctly.