Search code examples
nosqlbigtablenon-relational-database

iBigTable: Number of rows cached per request


I'm reading the "Read Performance" subsubsection of subsection 6.3 of iBigTable: Practical Data Integrity for BigTable in PublicCloud. One of the phrases used is "...number of rows cached per request for a scan."

I am new to databases. Is this discussing the idea of a tablet T storing rows (that have recently been used in a query response) in the sense that the next query, q, might result in retrieving some of those cached rows? That is, instead of forwarding q to other tablets, T looks into its cache to see if part of the response to q can be found?

References for further reading will be much appreciated.


Solution

  • I can't answer from a standpoint of this particular BigTable variation, although I can answer from a Google Cloud's BigTable which has a data model similar to Apache HBase. Caching rows would be used in circumstances where they are frequently accessed to prevent hotspotting on those rows. If the row is not cached, it will move up the tablet hierarchy to query for them. You can find more information in these research papers about how Google Cloud BigTable handles caching, tablets and so on[1][2][3].

    [1] https://www.cs.rochester.edu/courses/261/spring2017/termpaper/16/paper.pdf

    [2] https://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf

    [3] https://cloud.google.com/bigtable/docs/schema-design