How does Dremel or its implementation (say Drill) handle large columnar data layout in memory?

I am going through the white paper of Google Dremel. I came to know it converts complex data into columnar data layout.

At what location is this data stored?

As Drill has no central metadata repository, I assume it must be in-memory.

Therefore how does Drill handle this data when I have billions of rows?

Solution

To get complete, consistent query results from billions of rows, you'll use a distributed file system connected to multiple Drillbits, simulate a distributed file system by copying files to each node, or use an NFS volume, such as Amazon Elastic File System. Drill performs performant querying of big data using a number of techniques, including these:

Relies on the cluster nodes to handle failures (doesn't spend time on failure-related tasks).
Uses an in-memory data model that's hierarchical and columnar (doesn't access the disk for columns that are not involved in an analytic query, processing the columnar data without row materialization).
Uses columnar storage optimizations and execution (keeps memory footprint low).
Uses vectorization to work on arrays of values from different records rather than single values from one record at a time.

For more information, see http://drill.apache.org/docs/performance/.