Search code examples
javaperformancesortingdata-structuresmultiway-tree

Data structures: Which should I use for these conditions?


This shouldn't be a difficult question, but I'd just like someone to bounce it off of before I continue. I simply need to decide what data structure to use based on these expected activities:

  1. Will need to frequently iterate through in sorted order (starting at the head).
  2. Will need to remove/restore arbitrary elements from the/a sorted view.
  3. Later I'll be frequently resorting the data and working with multiple sorted views.
  4. Also later I'll be frequently changing the position of elements within their sorted views.

This is in Java, by the way.

My best guess is that I'll either be rolling some custom Linked Hash Set (to arrange the links in sorted order) or possibly just using a Tree Set. But I'm still not completely sure yet. Recommendations?

Edit: I guess because of the arbitrary remove/restore, I should probably stick with a Tree Set, right?

Actually, not necessarily. Hmmm...


Solution

  • In theory, I'd say the right data structure is a multiway tree - preferably something like a B+ tree. Traditionally this is a disk-based data structure, but modern main memory has a lot of similar characteristics due to layers of cache and virtual memory.

    In-order iteration of a B+ tree is very efficient because (1) you only iterate through the linked-list of leaf nodes - branch nodes aren't needed, and (2) you get extremely good locality.

    Finding, removing and inserting arbitrary elements is log(n) as with any balanced tree, though with different constant factors.

    Resorting within the tree is mostly a matter of choosing an algorithm that gives good performance when operating on a linked list of blocks (the leaf nodes), minimising the need to use leaf nodes - variants of quicksort or mergesort seem like likely candidates. Once the items are sorted in the branch nodes, just propogate the summary information back through the leaf nodes.

    BUT - pragmatically, this is only something you'd do if you're very sure that you need it. Odds are good that you're better off using some standard container. Algorithm/data structure optimisation is the best kind of optimisation, but it can still be premature.