I am rather interested in using a skip list for a Open list for A*. However what bugs me is the probabilistic nature of it. Open lists can vary from very small sets on up to hugh numbers of nodes and needs to maintain performance for each.
As I understand Skip lists have a higher chance of Randomly giving bad results for small data sets. I think this might become problematic with large numbers of short Paths being generated.
I was thinking to fix this why not watchdog the random numbers to a degree. Keep a running total of the number of nodes of each level and in order to maintain the ideal distribution of nodes between each level sometimes intervene and force a node to be a specific level.
I am unsure just how well this would work in my given application and if maybe i should focus on another data structure for my open lists instead.
All the articles i have read on skip lists have made no mention of such an optimization. Since I am fairly new to the whole performance profiling game, I am hesitant to try and improve on a documented data structure.
Yes, you are correct - the skip lists has higher chance to decay when talking about small skip lists.
In general, according to this paper, there is a constant alpha
such that the probability of a skip list to exceed alpha * n
space is less then 1/2Ω(sqrt(n))
- so as the skip lists get bigger, the probability of this (exceeding this limit) gets smaller and smaller.
To avoid the skip list worst cases, one can use a variation of the original skip list, a deterministic skip list. This subject is discussed in this thesis work
Other alternatives are balanced BSTs, such as AVL and red-black-tree, or even B+ trees (which are usually prefered for file systems).
If your "open set" is indeed so small - odds are your branch factor is very small as well (close to 1), or to be exact the B^d
(d=depth of solution; B=branch factor) will be small as well. This will result in a fast look up, regardless of the skip list implementation, because few nodes are expected to be needed.