Search code examples
mysqllarge-data

MysqL big table query optimization


I have a chatting application. I have an api which returns list of users who the user talked. But it takes a long to mysql return a list messages when it reachs 100000 rows of data. This is my messages table

CREATE TABLE IF NOT EXISTS `messages` (
  `_id` int(11) NOT NULL AUTO_INCREMENT,
  `fromid` int(11) NOT NULL,
  `toid` int(11) NOT NULL,
  `message` text NOT NULL,
  `attachments` text NOT NULL,
  `status` tinyint(1) NOT NULL DEFAULT '0',
  `date` datetime NOT NULL,
  `delete` varchar(50) NOT NULL,
  `uuid_read` varchar(250) NOT NULL,
  PRIMARY KEY (`_id`),
  KEY `fromid` (`fromid`,`toid`,`status`,`delete`,`uuid_read`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=118561 ;

and this is my users table (simplified)

CREATE TABLE IF NOT EXISTS `users` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `login` varchar(50) DEFAULT '',
  `sex` tinyint(1) DEFAULT '0',
  `status` varchar(255) DEFAULT '',
  `avatar` varchar(30) DEFAULT '0',
  `last_active` datetime DEFAULT NULL,
  `active` tinyint(1) DEFAULT '1',
  PRIMARY KEY (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 AUTO_INCREMENT=15523 ;

And here is my query (for user with id 1930)

select SQL_CALC_FOUND_ROWS `u_id`, `id`, `login`, `sex`, `birthdate`, `avatar`, `online_status`, SUM(`count`) as `count`, SUM(`nr_count`) as `nr_count`, `date`, `last_mesg` from
(
(select `m`.`fromid` as `u_id`, `u`.`id`, `u`.`login`, `u`.`sex`, `u`.`birthdate`, `u`.`avatar`, `u`.`last_active` as online_status, COUNT(`m`.`_id`) as `count`, (COUNT(`m`.`_id`)-SUM(`m`.`status`)) as `nr_count`, `tm`.`date` as `date`, `tm`.`message` as `last_mesg` from `messages` as m inner join `messages` as tm on `tm`.`_id`=(select MAX(`_id`) from `messages` as `tmz` where `tmz`.`fromid`=`m`.`fromid`) left join `users` as u on `u`.`id`=`m`.`fromid` where `m`.`toid`=1930 and `m`.`delete` not like '%1930;%' group by `u`.`id`)
UNION
(select `m`.toid as `u_id`, `u`.`id`, `u`.`login`, `u`.`sex`, `u`.`birthdate`, `u`.`avatar`, `u`.`last_active` as online_status, COUNT(`m`.`_id`) as `count`, 0 as `nr_count`, `tm`.`date` as `date`, `tm`.`message` as `last_mesg` from `messages` as m inner join `messages` as tm on `tm`.`_id`=(select MAX(`_id`) from `messages` as `tmz` where `tmz`.`toid`=`m`.`toid`) left join `users` as u on `u`.`id`=`m`.`toid` where `m`.`fromid`=1930 and `m`.`delete` not like '%1930;%' group by `u`.`id`)
order by `date` desc ) as `f` group by `u_id` order by `date` desc limit 0,10

Please help to optimize this query

What I need, Who user talked to (name, sex, and etc) What was the last message (from me or to me) Count of messages (all) Count of unread messages (only to me)

The query works well, but takes too long.

The output must be like this

enter image description here


Solution

  • You have some design problems on your query and database.

    • You should avoid keywords as column names, as that delete column or the count column;
    • You should avoid selecting columns not declared in the group by without an aggregation function... although MySQL allows this, it's not a standard and you don't have any control on what data will be selected;
    • Your not like construction may cause a bad behavior on your query because '%1930;%' may match 11930; and 11930 is not equal to 1930;
    • You should avoid like constructions starting and ending with % wildcard, which will cause the text processing to take longer;
    • You should design a better way to represent a message deletion, probably a better flag and/or another table to save any important data related with the action;
    • Try to limit your result before the join conditions (with a derived table) to perform less processing;

    I tried to rewrite your query the best way I understood it. I've executed my query in a messages table with ~200.000 rows and no indexes and it performed in 0,15 seconds. But, for sure you should create the right indexes to help it perform better when the amount of data increase.

    SELECT SQL_CALC_FOUND_ROWS 
      u.id, 
      u.login, 
      u.sex, 
      u.birthdate, 
      u.avatar, 
      u.last_active AS online_status, 
      g._count, 
      CASE WHEN m.toid = 1930 
        THEN g.nr_count 
        ELSE 0 
      END AS nr_count, 
      m.`date`, 
      m.message AS last_mesg 
    FROM
    (
    
      SELECT 
        MAX(_id) AS _id, 
        COUNT(*) AS _count, 
        COUNT(*) - SUM(m.status) AS nr_count
      FROM messages m
      WHERE 1=1
        AND m.`delete` NOT LIKE '%1930;%' 
        AND
        (0=1
          OR m.fromid = 1930 
          OR m.toid   = 1930
        )
      GROUP BY 
        CASE WHEN m.fromid = 1930 
          THEN m.toid 
          ELSE m.fromid 
        END
      ORDER BY MAX(`date`) DESC
      LIMIT 0, 10
    ) g
    INNER JOIN messages AS m ON 1=1 
      AND m._id = g._id
    LEFT JOIN users AS u ON 0=1 
      OR (m.fromid <> 1930 AND u.id = m.fromid)
      OR (m.toid   <> 1930 AND u.id = m.toid)
    ORDER BY m.`date` DESC
    ;