I'm new to TimescaleDB and started with exploring the documentation. It's pretty clear and looks like I've missed something important.
I've created a table:
CREATE TABLE session_created
(
event_timestamp timestamp without time zone NOT NULL,
client_id integer,
client_version text,
identifier text,
platform text,
remote_addr text,
country text,
type smallint,
session text
);
CREATE INDEX session_created_client_id_idx ON session_created USING btree (client_id ASC NULLS LAST);
CREATE INDEX session_created_event_timestamp_idx ON session_created USING btree (event_timestamp ASC NULLS LAST);
Then made it a hypertable with following compression settings:
SELECT create_hypertable('session_created','event_timestamp');
ALTER TABLE session_created SET (
timescaledb.compress,
timescaledb.compress_segmentby = 'event_timestamp'
);
SELECT add_compression_policy('session_created', INTERVAL '1 days');
And filled the table with one million rows a day from 2021-11-09 to 2021-11-2. Like this:
INSERT INTO session_created
(
event_timestamp,
client_id,
client_version,
identifier,
platform,
remote_addr,
country,
type,
session
)
SELECT '2021-11-23 00:00:00'::timestamp + s.id * interval '85 milliseconds',
s.id % 500000,
'1.0.1234',
'deviceid-' || s.id % 500000,
'Android',
'127.0.0.' || s.id % 256,
'RU',
0,
md5(random()::text || clock_timestamp()::text)::uuid
FROM generate_series(1, 1000000) AS s(id);
The target table contained three chunks and I've compressed two of them to play with core TimescaleDB feature:
SELECT compress_chunk(chunk_name)
FROM show_chunks('session_created', older_than => INTERVAL ' 1 day') chunk_name;
The problem is that compressed data took three much space than data before compression.
before compression | after compression |
---|---|
1702 MB | 4178 MB |
Also, it takes more time to query compressed data in an analitic manner:
select event_timestamp::date as date, count(distinct client_id) as clients
from session_created
where event_timestamp between (current_date - 7)::date and (current_date - 6)::date - interval '1 second'
group by 1
The question is what I've missed in the documentation? What are sources of the problems?
Typically the "time" column used to create the hypertable is not used as a segment by column while setting up compression. segment_by column is a column that has some commonality across the data set. e.g. If we have a table with device readings (device_id, event_timestamp, event_id, reading) the segment by column could be device_id (say you have a few 1000 devices and the device_readings table has data in the order of millions/billions). Note that the data in the segment by column is never stored in compressed form. Only the non segment by columns get compressed.