Search code examples
postgresqltimescaledb

What is space partitioning and dimensions in TimesclaleDB


I am new to the Timescale database. I was learning about chunks and how to create chunks based on time.
But there is another time/space chunking which is confusing me a lot.

  1. What is a "dimension" in a timescale DB?
  2. What is "space" chunking and how does it work?

Solution

  • A dimension in TimescaleDB is associated with a column. Each hypertable requires to define at least a time dimension, which is a time column for the time series. Then a hypertable is divided into chunks, where each chunk contains data for a time interval of the time dimension. As result all new data usually arrives into the latets chunk, while other chunks contain older data.

    Then, it is possible to define space dimensions on other columns, for example device column or/and location column. No interval is defined for space dimensions, instead a number of partitions is defined. So for the same time interval, several chunks will be created, which is equivalent to the number of partitions. Data are distributed by a hashing function on the values of the space dimension. For example, if 3 partitions are defined for a space dimension on device column and 12 different device values were present in the data, each space chunk will contain 4 different values with a hash function uniformly distributing the values.

    Space dimensions are specifically useful for parallel I/O, when data are stored on several disks. Another scenario is multinode, i.e., distributed version of hypertable (beta feature, which coming to release in 2.0).

    There are some complex usage cases when space partitioning will be also helpful.

    You can read more in add_dimension docs, cloud KB about space partitioning

    A note in the doc:

    Supporting more than one additional dimension is currently experimental.