Search code examples
databasedata-structuresb-treerecordsavl-tree

Which data structure should i use to represent a large amount of records each presents range of items?


I'm looking for software represenation for a very large amount of records ( more than 400K records)

Each record has two keys . one for lower bound and one for upper bound. These number represent a range. Also each record has some piece of information lets call it I . In other words , each record aggregates common item indexs , and has some common description about them.

My software is given an item number , and I have to retrive that info about it.

I thought about AVL , B- Tress or fibonaci . But i'm sure which will be the best for that large amount of record . I would deffinately go for AVL / balanced AVL for a small database.


Solution

  • From a data structure point of view, you search for an interval tree.

    The wikipedia article is quite good. What you can do, is to augment a (balanced) binary search tree like AVL or Red-Black-Trees like. Interval trees based on binary search tree have an own section in the classical DS book by Cormen et al..

    A good data structure scales well to large amounts of data. The complexity for the major directory operations are O(k + log n) where n is the number of intervals in the tree and k is the number of overlapping intervals in the range. This is usually pretty good. It grows slowly with the number of of interval items, except for cases where a lot or most intervals overlap all others.

    If you cannot hold your data in main memory, a B-Tree would be a good choice.