We're using MariaDb in production and we've added a MariaDb slave so that our data team can perform some ETL tasks from this slave to our datawarehouse. However, they lack a proper Change Data Capture feature (i.e. they want to know which rows from the production table changed since yesterday in order to query rows that actually changed).
I saw that MariaDb's 10.3 had an interesting feature that allowed to perform a SELECT on an older version of a table. However, I haven't found resources that supported the idea that it could be used for CDC, any feedback on this feature?
If not, we'll probably resort to streaming the slave's binlogs to our datawarehouse but that looks challenging..
Thanks for your help!
MariaDB supports System-Versioned Tables since version 10.3.4. System version tables are specified in the SQL:2011 standard. They can be used for automatically capturing previous versions of rows. Those versions can then be queried to retrieve their values as they have been set at a specific point in time.
The following text and code example is from the official MariaDB documentation
With system-versioned tables, MariaDB Server tracks the points in time when rows change. When you update a row on these tables, it creates a new row to display as current without removing the old data. This tracking remains transparent to the application. When querying a system-versioned table, you can retrieve either the most current values for every row or the historic values available at a given point in time.
You may find this feature useful in efficiently tracking the time of changes to continuously-monitored values that do not change frequently, such as changes in temperature over the course of a year. System versioning is often useful for auditing.
With adding SYSTEM VERSIONING
to a newly created or an already existing table (using ALTER
), the table will be expanded by row_start
and row_end
time stamp columns which allow retrieving the record valid within the time between the start and the end timestamps.
CREATE TABLE accounts (
id INT PRIMARY KEY AUTO_INCREMENT,
name VARCHAR(255),
amount INT
) WITH SYSTEM VERSIONING;
It is then possible to retrieve data as it was at a specific time (with SELECT * FROM accounts FOR SYSTEM_TIME AS OF '2019-06-18 11:00';
), all versions within a specific time range
SELECT * FROM accounts
FOR SYSTEM_TIME
BETWEEN (NOW() - INTERVAL 1 YEAR)
AND NOW();
or all versions at once:
SELECT * FROM accounts
FOR SYSTEM_TIME ALL;