Search code examples
mysqlweb-crawlerlarge-data

Optimize queries for tables with large text columns (15k - 5000k)


My table:

CREATE TABLE `html` (
  `html_id` int(10) unsigned NOT NULL,
  `links_id` int(10) unsigned NOT NULL,
  `url` varchar(2000) DEFAULT NULL,
  `tld` varchar(10) DEFAULT NULL,
  ....
  `html` mediumtext  
  PRIMARY KEY (`html_id`,`links_id`),
  UNIQUE KEY `links_id_url_index` (`links_id`,`url`(255))
) DEFAULT CHARSET=utf8;
  • it has > 20 GB of data total (~1 mil rows)
  • queries are mostly for statistics and don't require the html column
  • ex: SELECT tld, MAX(URL), count(*) FROM html GROUP BY tld

also the table has to endure many inserts (>500/s) from a multithreaded c++ web-crawler with the (html_id,links_id) & (links_id,url(255)) key checks, and in this configuration it works fine, but by adding other keys I'm afraid it could slow it down.

From all my tests it seams that the large html column is the cause of slow queries, what is the best way to go about this?


Solution

  • Move the html column to another table. Also, why would you run MAX() on a varchar column?