Search code examples
mysqlinnodb

When sending a DELETE query, does innoDB delete data from database one-by-one, or first create a list of data to be deleted?


I have a database table with ~140,000 rows, and right now I am doing a query that goes like

DELETE FROM database WHERE type="delete"

But it is taking a long time. I am tempted to halt the query and see if the database has actually shrunken in size, but I am not sure if it's in the process of creating a list of data to be deleted before the data actually gets removed. How does innoDB actually clear out data from the database upon receiving a DELETE query?

Here is the CREATE query (for the table images):

CREATE TABLE IF NOT EXISTS `images` (
  `imageID` int(11) NOT NULL AUTO_INCREMENT,
  `runID_fk` int(11) DEFAULT NULL,
  `sequenceID_fk` int(11) DEFAULT NULL,
  `cameraID_fk` int(11) DEFAULT NULL,
  `data` longblob,
  `timestamp` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `name` text,
  `type` text,
  `pcadata` longblob,
  PRIMARY KEY (`imageID`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Solution

  • Based on your CREATE statement, you do not have an index on the type column. So, your problem is due to a slow WHERE clause. Basically, your query is forcing the database server to look at each and every row to see if value in the type column is "delete". That is slow, no matter how many rows you have. An index would make this much faster because the server could essentially just ask for a list of all matching rows, then only operate on those rows.

    As a very broad rule of thumb, if you intend to use a column in a WHERE clause, it should be indexed. (There are exceptions, for sure, but this is not one of them based on what you've posted.)

    You can see what the database server is doing by running an EXPLAIN query, as in

    EXPLAIN DELETE FROM database WHERE type="delete"
    

    Based on your comment, your EXPLAIN returns something like this:

    id   select_type   table    partitions   type   possible_keys   key     key_len   ref    rows     filtered  Extra
    1    SIMPLE        images                ALL    null            null    null      null   145669             Using where
    

    That 145,699 that you see is not (as you wrote in the comment) "the size of the rows." It's the number of rows that the server has to look at to execute your query. As you can see, it doesn't have any indexes it can use. So, instead of looking at only the rows you want to delete, it has to look at 145,669 rows.

    Also, since your type column is a TEXT column, this will be even slower. You should seriously consider adding a TINYINT, CHAR (not TEXT), or other column to store the status of the row if you intend to flag rows for deletion this way.

    On a more general note, you really should reconsider some of those column types. I seriously doubt that you need 2^16 - 1 bytes for a column named name, for example.