Search code examples
postgresqltypeorm

How to do nested sorting in postgresql


I have columns id (string), parentId (string or null) and order (number). It's tree like structure where rows with the same parentId are children of the same node. order is used to sort children for each parent (so they might be the same for childs of different parents).

For a given id, I want to extract its previous and next row. I need to somehow sort data first by parentId looking at its order and then isnert in the middle of table their childrens, again sorted by order and again (recursion). Here is how my data looks like after running query:

SELECT "content"."id", "content"."parentId", "content"."order" FROM "course_content_entity" "content" WHERE ( "content"."deletedAt" IS NULL ) AND ("content"."courseId" = '05dd28d5-a244-4ca1-b7fb-fc3bc7b2e422') order by "content"."parentId", "content"."order" 

enter image description here

It's of course now what I want. Any ideas how to do that?

EDIT

Just to explain why I need it: for a given child in a tree (which represent a course), I need to add a navigation to go to the next and previous article which might have different parents.

Thanks to @Islingre query, I manage to edit it to fit my database structure and I've got the following query and results:

// query
WITH RECURSIVE node(id, "parentId", "order", "name", position_array) AS (
    SELECT
        root.id,
        root."parentId",
        root.order,
        root.name,
        ARRAY[ROW_NUMBER() OVER (ORDER BY root.order, root.id)] AS position_array
    FROM course_content_entity root
    WHERE root."parentId" IS NULL AND root."courseId" = '18f75c26-3608-49c2-8e1c-56618b35c780'
UNION ALL
    SELECT
        child.id,
        child."parentId",
        child.order,
        child.name,
        parent.position_array || ROW_NUMBER() OVER (PARTITION BY child."parentId" ORDER BY child.order) AS position_array
    FROM course_content_entity child
    INNER JOIN node parent  ON parent.id::uuid = child."parentId"::uuid
    WHERE child."courseId" = '18f75c26-3608-49c2-8e1c-56618b35c780'

  --parent.id IS NOT DISTINCT FROM child."parentId" 
)
SELECT
    n.id,
    n."parentId",
    n.order,
    n.name,
    n.position_array,
    ROW_NUMBER() OVER (ORDER BY n.position_array) AS dfs_position
FROM node n
-- WHERE n.id = '7fcebdc8-9dd4-43e3-afd4-641695d8f769'
ORDER BY dfs_position;

Result: Result of a query

I added to the query twice: root."courseId" = '18f75c26-3608-49c2-8e1c-56618b35c780' to get only contents from a given course. So it works and now I need to get row before and after a particular node. Because I'm working in typeorm, it would be even better to have this query written using that library and its queryBuilder.

This is my Entity:

@Entity()
export class CourseEntity {
  @PrimaryGeneratedColumn('uuid')
  id: string;

  @CreateDateColumn()
  createdAt: Date;

  @UpdateDateColumn()
  updatedAt: Date;

  @DeleteDateColumn()
  deletedAt: Date;

  @Column()
  name: string;

  @Column({ unique: true, default: null })
  slug: string;

  @Column({ default: null })
  description: string;

  @ManyToOne(() => UserEntity, (user) => user.courses)
  author!: UserEntity;

  @OneToMany(() => CourseContentEntity, (content) => content.course, { cascade: ['soft-remove'] })
  content: Array<CourseContentEntity>;


  constructor(partial: Partial<CourseEntity>) {
    Object.assign(this, partial);
  }
}



@Entity()
export class CourseContentEntity {
  @PrimaryGeneratedColumn('uuid')
  id: string;

  @CreateDateColumn()
  createdAt: Date;

  @UpdateDateColumn()
  updatedAt: Date;

  @DeleteDateColumn()
  deletedAt: Date;

  @Column()
  name: string;

  @Column({ default: '' })
  slug: string;

  @Column({ nullable: true })
  parentId: string;

  @Column({ default: 0 })
  order: number;

  @Column({ default: '' })
  content: string;

  constructor(partial: Partial<CourseContentEntity>) {
    Object.assign(this, partial);
  }
}

And this is how I handle searching for next and previous node:


  async getPagination(contentId: string) {
    const content = await this._courseContentRepository.findOne({
      where: { id: contentId },
      relations: { course: true },
    });
    const query = this._courseContentRepository
      .createQueryBuilder('content')
      .leftJoin('content.course', 'course')
      .andWhere('course.id = :courseId', { courseId: content.course.id });

    /* NEXT */
    const getNext = async (content: CourseContentEntity) => {
      const nextQuery = query.clone();
      if (content.parentId) {
        nextQuery.where('content.parentId = :parentId', { parentId: content.parentId });
      } else {
        nextQuery.where('content.parentId IS NULL');
      }
      return await nextQuery
        .andWhere('content.order > :order', { order: content.order })
        .orderBy('content.order', 'ASC')
        .getOne();
    };

    const firstChild = await this._courseContentRepository.findOne({
      where: { parentId: content.id },
    });
    let next = firstChild;

    // Try to get next sibling
    if (!next) next = await getNext(content);
    if (!next && content.parentId) {
      // It's a last child node -> take node after parent node
      const parent = await this._courseContentRepository.findOne({
        where: { id: content.parentId },
      });

      next = await getNext(parent);
    }

    /* PREV */
    const getPrev = async (parentId: string | null, order: number) => {
      const prevQuery = query.clone();
      if (parentId) {
        prevQuery.where('content.parentId = :parentId', { parentId: parentId });
      } else {
        prevQuery.where('content.parentId IS NULL');
      }
      return await prevQuery.andWhere('content.order < :order', { order }).orderBy('content.order', 'DESC').getOne();
    };

    let prev = await getPrev(content.parentId, content.order);
    if (prev) {
      const lastChild = await query
        .andWhere('content.parentId = :parentId', { parentId: prev.id })
        .orderBy('content.order', 'DESC')
        .getOne();
      if (lastChild) prev = lastChild;
    } else {
      if (content.parentId) {
        const parent = await this._courseContentRepository.findOne({
          where: { id: content.parentId },
        });
        prev = parent;
      } else {
        prev = await getPrev(null, content.order);
        if (prev) {
          const lastChild = await query
            .andWhere('content.parentId = :parentId', { parentId: prev.id })
            .orderBy('content.order', 'DESC')
            .getOne();
          if (lastChild) prev = lastChild;
        }
      }
    }

    // const nextQuery = query
    //   .select('LEAD(content.id) over (order by content.order asc) as next_id')
    //   .addSelect('LEAD(content.name) over (order by content.order asc) as next_name')
    //   .where('content.parentId = :parentId', { parentId: content.parentId })
    //   .andWhere('content.id = :contentId', { contentId: contentId });

    return {
      next: next
        ? {
            id: next.id,
            name: next.name,
          }
        : undefined,
      prev: prev
        ? {
            id: prev.id,
            name: prev.name,
          }
        : undefined,
    };
  }

I'm wondering which soluton will be better in the end (mine current one or sorting the tree).


Solution

  • To get the rows sorted in the tree's DFS order, you can try the following query:

    WITH RECURSIVE node(id, parent_id, position, position_array) AS (
        SELECT
            root.id,
            root.parent_id,
            root.position,
            ARRAY[ROW_NUMBER() OVER (ORDER BY root.position, root.id)] AS position_array
        FROM forest_node root
        WHERE root.parent_id IS NULL
    UNION ALL
        SELECT
            child.id,
            child.parent_id,
            child.position,
            parent.position_array || ROW_NUMBER() OVER (PARTITION BY child.parent_id ORDER BY child.position) AS position_array
        FROM forest_node child
        INNER JOIN node parent ON parent.id = child.parent_id
    )
    SELECT
        n.id,
        n.parent_id,
        n.position,
        n.position_array,
        ROW_NUMBER() OVER (ORDER BY n.position_array) AS dfs_position
    FROM node n
    ORDER BY dfs_position;
    

    This query starts with the roots (parent_id IS NULL) and recursively (WITH RECURSIVE) adds their children, building the position_array via the ROW_NUMBER() OVER () window function (Window Function Tutorial, List of Window Functions). The sorting is finally done via ordering by the position_array, relying on array comparison (Array Functions and Operators).

    Renaming forest_node to "course_content_entity" and id, parent_id and position to "id", "parentId" and "order" should make this applicable to your case. I chose different names for multiple reasons, including a better general example and the urge to avoid quoted identifiers.

    Assumptions:

    • A root node may or may not have the position set. Non-root nodes always have the position set.
    • There might be multiple root nodes in the table (this is why I used forest_node instead of tree_node as tablename). We order them by their position and break ties by the id. All nodes of the same tree appear next to each other in the final list.

    Note: if there should be some cycle in the table, the query will ignore all entries of this cycle, as there is no path ending in a root node.

    Final remark: if you are working on a big table, it might be much more efficient to write a function that just traverses the tree node-by-node to find the "DFS-neighbors". The provided query will always work with all rows of the table which might become a bottleneck.

    Used for testing purposes:

    CREATE TABLE forest_node (
        id text NOT NULL,
        parent_id text,
        position integer CHECK(position IS NOT NULL OR parent_id IS NULL),
        PRIMARY KEY(id),
        UNIQUE(parent_id, position)
    );
    
    INSERT INTO forest_node (id, parent_id, position)
    VALUES
        ('root1', NULL, NULL),
        ('root2', NULL, NULL),
        ('root3', NULL, NULL),
        ('root1.1', 'root1', 1),
        ('root1.2', 'root1', 2),
        ('root3.2', 'root3', 2),
        ('root3.1', 'root3', 1),
        ('root2.1', 'root2', 1),
        ('root2.2', 'root2', 2),
        ('root2.1.1', 'root2.1', 1),
        ('root2.1.2', 'root2.1', 2),
        ('root2.4', 'root2', 4),
        ('root2.3', 'root2', 3);