sql postgresql binary-search-tree breadth-first-search ecto

efficient breadth first search using sql joins

I'm dealing with a binary tree.

So I have a database table in my database where each node is a parent to up to 2 other nodes. I have a plan to efficiently find the top most node (under a given node) that is a parent to less than 2 other nodes. I'm looking for the top most open position to place a new node in other words. So I have this implemented as a breadth-first search. But the way I'm calling the database for each and every node is inefficient. I'm basically going down the tree, producing a running list of nodes on each level and checking each one if it is a parent to two other nodes.

Here's a diagram:

And here's the code if you'd like to see it:

  # breadth-first search
  def build_and_return_parent_id(breadth_list) do
        [ {node_id} | tail ] = breadth_list

        child_list = fetch_children_id(node_id)

        bc_list = tail ++ child_list

        case length(child_list) do
          x when x > 2 ->


            # recursion
            build_and_return_parent_id(bc_list)

          2 ->

            # recursion
            build_and_return_parent_id(bc_list)

          _ -> node_id
        end
  end

  def fetch_children_id(id) do
    Repo.all( from n in Node,
              where: n.parent_id == ^id,
              order_by: [asc: n.inserted_at],
              select: {n.id})
  end
end

So instead of doing that so inefficiently - one db call per node - I was thinking, how about I produce a list of all the nodes that have less than two parents, then travel down the tree, for each level use one db call to get a list of all the nodes on that level, then simply compare the two lists. if there are matching IDs in both the lists I've found a node that has an available spot under it.

Here's a diagram:

The problem is I know almost nothing about sql queries. my guess is that this can be done with some kind of self join on the table.

node_id   |  parent_id
----------------------
1         |  nil
2         |  1
3         |  1
4         |  2
5         |  2
6         |  3
7         |  4
8         |  5
9         |  6
10        |  3

So anyway I'm sure if this method works someone has done it before but I can't seem to find any information on the kinds of sql queries that would be used to generate the open list or the level list.

Now I suppose the 2nd query is pretty simple. since we have an open list we can just use a where-in-[list] clause. Byt the first one I think is the one I'm struggling with.

If you have anything you can point me to or help you can offer I'd really appreciate it.

Solution

You can add columns depth and child_count and create an index:

create index nodes_depth_1child_idx on nodes(depth) where child_count=1;

Then searching should be basically instant with:

select node_id from nodes where child_count=1 order by depth limit 1;

You should also create triggers that would maintain these values. This would slow down insert operations slightly, as the insert would have to read the parent node depth and update the parent node child_count.