Search code examples
mysqlmariadbquery-optimization

MariaDB / MySQL: index structure for table with one value column


I have a MariaDB MyISAM table with three columns:

`time` int not null, 
`sensor` tinyint not null, 
`value` decimal (6,4) not null

Also I have a PRIMARY KEY (BTREE) on the columns time and sensor

The table has 250 million rows and 20 new records are added every 10 seconds. All SELECTS on the table have a WHERE clause either on just time or on time and sensor.

This works well performance wise but the index uses even more disk space than the table itsself (2.2GB for the data and 2.7GB for the index). This seems a bit silly to me as the index is basically the entire table, which means that MariaDB has essentially doubled all data.

Is there a better structure for this table?


Solution

  • The entire point of database indexes is to trade space for time. Yes, it's normal for indexes to take roughly as much space as the table, especially when the table has short rows like yours does.

    If you switch to the InnoDB storage engine your primary key will become a so-called clustered index. That is, the entire table will be contained within the primary key's index. That can save a lot of disk space.

    You should switch to InnoDB: MyISAM is a legacy storage engine that's not getting much attention from MariaDB's developers. If for some reason InnoDB doesn't work for you, switch to the more modern Aria storage engine. It is similar to MyISAM. Like MyISAM, it doesn't use a clustered PK index.

    Notice this: Your primary key is on (time, sensor). That means it's best suited to where clauses that look like this:

     WHERE time BETWEEN start AND finish
    

    If you remade your primary key so it were on (sensor, time) it would be suitable for

     WHERE sensor=somesensor AND time BETWEEN start AND finish
    

    Why? MySQL random-accesses the BTREE index to the first eligible row, then scans it sequentially to the last eligible row. You can read about this multi-column index stuff here and here.

    So, your choice of primary key column order should be based on which of those two WHERE patterns is more performance-critical: time-range only or sensor and time-range.

    If this were my table I'd define it like this:

    CREATE TABLE series (
        time TIMESTAMP NOT NULL DEFAULT current_timestamp(),
        sensor SMALLINT(6) NOT NULL DEFAULT '0',
        value DECIMAL(6,4) NOT NULL DEFAULT '0.0000',
        PRIMARY KEY (sensor, time) USING BTREE,
        INDEX time_covering (time, sensor, value) USING BTREE
    ) COLLATE='utf8mb4_general_ci' ENGINE=InnoDB;
    

    This table definition offers a clustered primary key optimized for filtering by sensor then time range. And I have added a covering index (basically a copy of the table) optimized for filtering by time range only. For information about covering indexes see the last section of this.

    It uses a TIMESTAMP data type. They're just as efficient at representing time as integers, and you get the benefit of time arithmetic. This gets you yesterday's readings from sensor 3.

    WHERE time >= CURDATE() - INTERVAL 1 DAY
      AND time < CURDATE()
      AND sensor = 3
    

    It uses a SMALLINT rather than a TINYINT for sensor number. You're less likely to run out of sensor numbers, and TINYINT data only helps save space when there are several of them in each row.