Search code examples
mahout

When should one prefer Mahout's SequentialAccessSparseVector over RandomAccessSparseVector?


I'm a newbie with regards to mahout. I would like to build my own algorithms with mahout's tools. I'm quite puzzled with the of usage Mahout's SequentialAccessSparseVector and RandomAccessSparseVector. Could someone suggest when should one prefer over the other?

Thanks


Solution

  • The random-access version is backed by a hashtable, which will have the fastest sets and gets. But the iteration order is undefined. Sometimes iterating over vectors in order of dimension makes other operations efficient, like in computing a dot product, which only needs to look at the dimensions where both are defined. It will have slightly slower sets and gets and maybe use a little more memory. Both are sparse representations though.