Search code examples
ruby-on-railspostgresqlactiverecordhierarchical-data

What is an efficient way to store and retrieve an arbitrary depth nested structure in postgres?


In my ruby-on-rails app, I have nested comments that can be nested an arbitrary length.

I tried different ways of storing this:

Using self joins:

belongs_to :parent, :class_name => 'Comment', :foreign_key => 'parent_id'
has_many :children, :class_name => 'Comment', :foreign_key => "parent_id"

Using ancestry gem

etc

The problem, though, is that no matter what I use, there will always be an linear number of SQL statements. (1 statement to grab all the root comments, and then 1 statement for each root's children, and then 1 statement for all the children of that, etc)

Is there a more efficient way to accomplish this?

Postgres 9.1, but hopefully backwards compatible solutions are preferred.


Solution

  • You could stick with your parent_id pointer column and use find_by_sql and a WITH RECURSIVE query and let the database do all the work in one shot. Something like this:

    comments = Comment.find_by_sql(%Q{
        with recursive tree(id) as (
            select c.id, c.column1, ...
            from comments c
            where c.id in (#{roots.join(',')})
            union all
            select c.id, c.column1, ...
            from comments c
            join tree on c.parent_id = tree.id
        )
        select id, column1, ...
        from tree
    })
    

    where roots would be a Ruby array holding the ids of the root nodes that you're interested in. That will give you all the nodes in the subtrees of interest as Comment instances. I've used queries like this in the past and WITH RECURSIVE was well over twice as fast as your iterative technique even with shallow trees, I'd guess that deeper trees would see even better speed ups.

    The parent_id structure you're using is very convenient for most things and meshes quite well with how ActiveRecord wants to work. Also, sticking with your current structure means that you can leave the rest of your application alone.

    WITH RECURSIVE is available in PostgreSQL 8.4 and higher.