Search code examples
databasehadoopdatabase-schemahbasesparse-matrix

What is meant by sparse data/ datastore/ database?


Have been reading up on Hadoop and HBase lately, and came across this term-

HBase is an open-source, distributed, sparse, column-oriented store...

What do they mean by sparse? Does it have something to do with a sparse matrix? I am guessing it is a property of the type of data it can store efficiently, and hence, would like to know more about it.


Solution

  • In a regular database, rows are sparse but columns are not. When a row is created, storage is allocated for every column, irrespective of whether a value exists for that field (a field being storage allocated for the intersection of a row and and a column).

    This allows fixed length rows greatly improving read and write times. Variable length data types are handled with an analogue of pointers.

    Sparse columns will incur a performance penalty and are unlikely to save you much disk space because the space required to indicate NULL is smaller than the 64-bit pointer required for the linked-list style of chained pointer architecture typically used to implement very large non-contiguous storage.

    Storage is cheap. Performance isn't.