Search code examples
javadata-structurestreeredditthreaded-comments

Most efficient data structure to represent threaded comments in Java?


I want to represent threaded comments in Java. This would look similar to the way comments are threaded on reddit.com

hello
   hello
      hello
      hello
   hello
   hello
      hello

As in the example above, responses are nested in the HTML with appropriate indentation to reflect their relationship to prior comments.

What would be an efficient way to represent this in Java?

I'm thinking some kind of tree data structure would be appropriate.

But is there one in particular which would be most efficient to minimize tree traversals?

This would be important if I have voting on each comment. Because then the tree would need to be reordered after each vote - a potentially expensive operation computationally.

By the way, if anyone knows of an open source existing implementation of this in Java, that would help too.


Solution

  • I would use levels of linked lists.

    message1
        message2
            message3
            message4
        message5
        message6
            message7
    

    Each node would have a pointer to its:

    - forward sibling  (2->5, 3->4, 5->6,                   1/4/6/7->NULL).
    - backward sibling (4->3, 5->2, 6->5,                   1/2/3/7->NULL).
    - first child      (1->2, 2->3, 6->7,                   3/4/5/7->NULL).
    - parent           (2->1, 3->2, 4->2, 5->1, 6->1, 7->6,       1->NULL).
    

    Within each level, messages would be sorted in the list by vote count (or whatever other score you wanted to use).

    That would give you maximum flexibility for moving things around and you could move whole sub-trees (e.g., message2) just by changing the links at the parent and that level.

    For example, say message6 gets a influx of votes that makes it more popular than message5. The changes are (adjusting both the next and previous sibling pointers):

    • message2 -> message6
    • message6 -> message5
    • message5 -> NULL.

    to get:

    message1
        message2
            message3
            message4
        message6
            message7
        message5
    

    If it continues until it garners more votes than message2, the following occurs:

    • message6 -> message2
    • message2 -> message5

    AND the first-child pointer of message1 is set to message6 (it was message2), still relatively easy, to get:

    message1
        message6
            message7
        message2
            message3
            message4
        message5
    

    Re-ordering only needs to occur when a score change results in a message becoming more than its upper sibling or less than its lower sibling. You don't need to re-order after every score change.