Search code examples
c#mysqlmariadbdataadapter

MySQL Faster Select Query | C# Faster Load DataTable


I am using MariaDB. I have a table that I create for every IoT device at the time of the first insertion with a stored procedure. If anyone wonders Why I create a new table for every device is devices publish data every 5 seconds and it is impossible for me to store all of them in a single table.

So, my table structure is like below:

CREATE TABLE IF NOT EXISTS `mqttpacket_',device_serial_number,'`(
`data_type_id` int(11) DEFAULT NULL, 
`data_value` int(11) DEFAULT NULL,
`inserted_date` DATE DEFAULT NULL, 
`inserted_time` TIME DEFAULT NULL, 
FOREIGN KEY(data_type_id) REFERENCES datatypes(id), 
INDEX `index_mqttpacket`(`data_type_id`,`inserted_date`)) ENGINE = INNODB;

I have a very long SELECT query like below to fetch the data between selected type, date, and time.

SELECT  mqttpacket_123.data_value, datatypes.data_name, datatypes.value_mult,
        CONCAT(mqttpacket_123.inserted_date, ' ',
               mqttpacket_123.inserted_time) AS 'inserted_date_time'
    FROM  mqttpacket_123
    JOIN  datatypes  ON mqttpacket_123.data_type_id = datatypes.id
    WHERE  mqttpacket_123.data_type_id IN(1,2,3,4,5,6)
      AND  CASE WHEN mqttpacket_123.inserted_date = '2021-11-08'
                    THEN mqttpacket_123.inserted_time > '12:25:00'
                WHEN mqttpacket_123.inserted_date = '2021-11-15'
                    THEN mqttpacket_123.inserted_time< '12:25:00'
                ELSE (mqttpacket_123.inserted_date BETWEEN '2021-11-08'
                                                       AND '2021-11-15')
                END;

and this returns around 500k records of the sample below:

|   data_value  |       data_name       |   value_mult  |   inserted_date_time  |
--------------------------------------------------------------------------------
|       271     |       name_1          |       0.1     |   2021-11-08 12:25:04 |
|       106     |       name_2          |       0.1     |   2021-11-08 12:25:04 |
|       66      |       name_3          |       0.1     |   2021-11-08 12:25:04 |
|       285     |       name_4          |       0.1     |   2021-11-08 12:25:04 |
|       61      |       name_5          |       0.1     |   2021-11-08 12:25:04 |
|       454     |       name_6          |       0.1     |   2021-11-08 12:25:04 |
|       299     |       name_7          |       0.1     |   2021-11-08 12:25:04 |

Affected rows: 0 Found rows: 395,332 Warnings: 0 Duration for 1 query: 0.734 sec. (+ 7.547 sec. network)

I keep only the last 2 weeks' data in my tables and clean up the previous data as I have a backup system.

However, Loading the query result to DataTable also takes ~30sec. which is 4 times slower than MySQL.

Do you have any suggestions to improve this performance?

PS. I call this query from C# by the following statement in a Stored Procedure of RunQuery which takes the query and performs it as it is.

public DataTable CallStoredProcedureRunQuery(string QueryString)
        {
            DataTable dt = new DataTable();

            try
            {
                using (var conn = new MySqlConnection(_connectionString))
                {
                    conn.Open();
                    using (var cmd = new MySqlCommand("SP_RunQuery", conn))
                    {
                        cmd.CommandType = CommandType.StoredProcedure;

                        cmd.Parameters.Add("@query_string", MySqlDbType.VarChar).Value = QueryString; 

                        using (MySqlDataAdapter sda = new MySqlDataAdapter(cmd))
                        {
                            sda.Fill(dt);
                        }
                    }
                }
            }
            catch (Exception ex)
            {
                IoTemplariLogger.tLogger.EXC("Call Stored Procedure for RunQuery failed.", ex);
            }
            return dt;
        }

EDIT: My sensors push a single MQTT packet which contains ~50 different data. There are 12 times 5seconds in a minute. So, basically, I receive ~600 rows per minute per device.

Data insertion is done in a Stored Procedure async. I push the JSON content along with the device_id and I iterate on the JSON to parse and insert into the table.

PS. The following code is just for clarification. It works fine.

/*Dynamic SQL -- IF they are registered to the system but have notable, create it.*/
    SET create_table_query = CONCAT('CREATE TABLE IF NOT EXISTS `mqttpacket_',device_serial_number,'`(`data_type_id` int(11) DEFAULT NULL, `data_value` int(11) DEFAULT NULL,`inserted_date` DATE DEFAULT NULL, `inserted_time` TIME DEFAULT NULL, FOREIGN KEY(data_type_id) REFERENCES datatypes(id), INDEX `index_mqttpacket`(`data_type_id`,`inserted_date`)) ENGINE = InnoDB;');
    PREPARE stmt FROM create_table_query;
    EXECUTE stmt;
    DEALLOCATE PREPARE stmt;
    
    /*Loop into coming value array. It is like: $.type_1,$.type_2,$.type_3, to iterate in the JSON. We reach each value like $.type_1*/
    WHILE (LOCATE(',', value_array) > 0)
    DO
        SET arr_data_type_name = SUBSTRING_INDEX(value_array,',',1); /*pick first item of value array*/
        SET value_array = SUBSTRING(value_array, LOCATE(',',value_array) + 1); /*remove picked first item from the value_array*/
        
        SELECT JSON_EXTRACT(incoming_data, arr_data_type_name) INTO value_iteration; /*extract value of first item. $.type_1*/
        
        SET arr_data_type_name := SUBSTRING_INDEX(arr_data_type_name, ".", -1); /*Remove the $ and the . to get pure data type name*/
        
        /*Check the data type name exists or not in the table, if not insert and assign it's id to lcl_data_type_id*/
            IF (SELECT COUNT(id) FROM datatypes WHERE datatypes.data_name = arr_data_type_name) > 0 THEN 
                SELECT id INTO lcl_data_type_id FROM datatypes WHERE datatypes.data_name = arr_data_type_name LIMIT 1;
            ELSE
                SELECT devices.device_type_id INTO lcl_device_type FROM devices WHERE devices.id = lcl_device_id LIMIT 1;
                INSERT INTO datatypes (datatypes.data_name,datatypes.description,datatypes.device_type_id,datatypes.value_mult ,datatypes.inserted_time) VALUES(arr_data_type_name,arr_data_type_name,lcl_device_type,0.1,NOW());
                SELECT id INTO lcl_data_type_id FROM datatypes WHERE datatypes.data_name = arr_data_type_name LIMIT 1;
            END IF;
            
            /*To retrieve the table of which device has which datatypes inserted, this is to not to retrieve the datatypes unneccesseraly for the selected device*/
            IF (SELECT COUNT(device_id) FROM devicedatatypes WHERE devicedatatypes.device_id = lcl_device_id AND devicedatatypes.datatype_id = lcl_data_type_id) < 1 THEN
                INSERT INTO devicedatatypes (devicedatatypes.device_id, devicedatatypes.datatype_id) VALUES(lcl_device_id,lcl_data_type_id);
            END IF;
            
            SET lcl_insert_mqtt_query = CONCAT('INSERT INTO mqttpacket_',device_serial_number,'(data_type_id,data_value,inserted_date,inserted_time) VALUES(',lcl_data_type_id,',',value_iteration,',''',data_date,''',''',data_time,''');');
            PREPARE stmt FROM lcl_insert_mqtt_query;
            EXECUTE stmt;
            SET affected_data_row_count = affected_data_row_count + 1;
                        
    END WHILE;

Here and here are also extra information that can be found of the server and database regarding the comments. I have an SSD on the server. There is nothing important else that works other than my dotnet application and database.


Solution

    • It is usually better to have a DATETIME column instead of splitting it into two (DATE and TIME) columns. That might simplify the WHERE clause.

    • Having one table per device is usually a bad idea. Instead, add a column for the device_id.

    • Not having a PRIMARY KEY is a bad idea. Do you ever get two readings in the same second for a specific device? Probably not.

    Rolling those together plus some other likely changes, start by changing the table to

    CREATE TABLE IF NOT EXISTS `mqttpacket`(
    `device_serial_number` SMALLINT UNSIGNED NOT NULL,
    `data_type_id` TINYINT UNSIGNED NOT NULL,
    `data_value` SMALLINT NOT NULL,
    `inserted_at` DATETIME NOT NULL, 
    FOREIGN KEY(data_type_id) REFERENCES datatypes(id), 
    PRIMARY KEY(device_serial_number, `data_type_id`,`inserted_at`)
    ) ENGINE = INNODB;
    

    That PK will make the query faster.

    This may be what you are looking for after the change to DATETIME:

      AND inserted_at >= '2021-11-08 12:25:00'
      AND inserted_at  < '2021-11-08 12:25:00' + INTERVAL 7 DAY
    

    To keep 2 weeks' worth of data, DROP PARTITION is an efficient way to do the delete. I would use PARTITION BY RANGE(TO_DAYS(inserted_at)) and have 16 partitions, as discussed in http://mysql.rjweb.org/doc.php/partitionmaint

    If you are inserting a thousand rows every 5 seconds -- With table-per-device, you would need a thousand threads each doing one insert. This would be a nightmare for the architecture. With a single table (as I suggest), and if you can get the 1000 rows together in a process at the same, time, do one multi-row INSERT every 5 seconds. I discuss other high speed ingestion.