Search code examples
mysqlhashtableinnodbb-treein-memory-tables

Choose Mysql engine for processing a big "type-value" table


My task is to delete all entities that were not affected during an operation from the database. I created a separate table which have two columns, first represets the name of the table and second is the id of the record in that table.

CREATE TABLE edited_entities (
        table VARCHAR(50) not null, 
        id BIGINT(20) not null)

For example if I have table

CREATE TABLE puppy(
        id BIGINT(20) not null, 
        name VARCHAR(20) not null)

and a record in it

id | name
1  | Rex

If I edit this record i will put the following data into edited_entities:

table | id
puppy | 1

Then I need to delete all non affected entities (which ids are not in edited_entities table) and I do following:

delete from puppy where id not in 
    (select ee.id from edited_entities ee where ee.table= 'puppy');

I wonder what is the best engine for such kind of operation (MySql)? The default db engine is InnoDB. I thought about Memory (Heap) but I am not sure if it can faster the delete operation.

If you have suggestion how can I optimise the required operation I will be glad to here it.

I don't whant to add additional columns into puppy table.


Solution

  • Memory will be faster, since it doesn't have to go to disk at the end of the transaction. In this case I'd try first BTREE and not HASH index, because it lets you use partial composite indexes, just like on-disk tables.

    Also try with prepared statements for insert and delete operations: prepare one of each before processing and then invoke with the relevant parameters. It might be faster since it doesn't have to parse the SQL; but there are some cases where the whole system gets a little slower because they take non-trivial amounts of memory.

    Another option is the experimental 'HandlerSocket' feature, available in several forks of MySQL like Percona server, it lets you access a MySQL table as a NoSQL store, with huge performance benefits, but full ACID compliance. (The Percona people are performance fanatics; even if you can't use HandlerSocket, be sure to test their fork)

    Finally, a viable (but more work for you) option is a separate in-memory database. I'd tend to go with Redis, which is a very high-speed in-memory key-value store with the added twist that the 'values' are useful data structures. In your case, you could store a set of IDs for each table, something like

    tokeep:puppy => 1,4,6,76.....
    

    it's a simple (and atomic) operation to add an element to the set (SADD tokeep:puppy 76), and at the end you fetch them all to create the SQL DELETE WITH id NOT IN (...) operation

    And, the very last option I think of (and still more work for you), would be to put everything in the same store: there's a fork of Redis, previously called redisql, but now it's now Alchemy Database; it adds SQL tables to Redis, keeping most of the performance benefits of NoSQL. So, you could have your 'regular' tables as SQL tables in Alchemy, store your 'tokeep' sets on NoSQL on the same server, and at the end do a:

    DELETE FROM puppy WHERE id NOT IN ($SMEMBERS tokeep:puppy );
    

    Bam!