Search code examples
mysqlquery-optimizationmariadb

Optimize Query on mysql


I have a query that runs really slow (15 20 seconds) when is not on memory and quite fast when is on memory (2s - 0.6s)

select count(distinct(concat(conexiones.tMacAdres,date_format(conexiones.fFecha,'%Y%m%d')))) as Conexiones,
                    sum(if(conexiones.tEvento='megusta',1,0)) as MeGusta,sum(if(conexiones.tEvento='megusta',conexiones.nAmigos,0)) as ImpactosMeGusta,
                    sum(if(conexiones.tEvento='checkin',1,0)) as CheckIn,sum(if(conexiones.tEvento='checkin',conexiones.nAmigos,0)) as ImpactosCheckIn,
                    min(conexiones.fFecha) Fecha_Inicio, now() Fecha_fin,datediff(now(),min(conexiones.fFecha)) as dias
                    from conexiones, instalaciones
                    where  conexiones.idInstalacion=instalaciones.idInstalacion and conexiones.idInstalacion=190
                        and (fFecha between '2014-01-01 00:00:00' and '2016-06-18 23:59:59')
                    group by instalaciones.tNombre
                    order by instalaciones.idCliente

This is Table SCHEMAS: Instalaciones with 1332 rows:

CREATE TABLE `instalaciones` (
  `idInstalacion` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `idCliente` int(10) unsigned DEFAULT NULL,
  `tRouterSerial` varchar(50) DEFAULT NULL,
  `tFacebookPage` varchar(256) DEFAULT NULL,
  `tidFacebook` varchar(64) DEFAULT NULL,
  `tNombre` varchar(128) DEFAULT NULL,
  `tMensaje` varchar(128) DEFAULT NULL,
  `tWebPage` varchar(128) DEFAULT NULL,
  `tDireccion` varchar(128) DEFAULT NULL,
  `tPoblacion` varchar(128) DEFAULT NULL,
  `tProvincia` varchar(64) DEFAULT NULL,
  `tCodigoPosta` varchar(8) DEFAULT NULL,
  `tLatitud` decimal(15,12) DEFAULT NULL,
  `tLongitud` decimal(15,12) DEFAULT NULL,
  `tSSID1` varchar(40) DEFAULT NULL,
  `tSSID2` varchar(40) DEFAULT NULL,
  `tSSID2_Pass` varchar(40) DEFAULT NULL,
  `fSincro` datetime DEFAULT NULL,
  `tEstado` varchar(10) DEFAULT NULL,
  `tHotspot` varchar(10) DEFAULT NULL,
  `fAlta` datetime DEFAULT NULL,
  PRIMARY KEY (`idInstalacion`),
  UNIQUE KEY `tRouterSerial` (`tRouterSerial`),
  KEY `idInstalacion` (`idInstalacion`)
) ENGINE=InnoDB AUTO_INCREMENT=1332 DEFAULT CHARSET=utf8;

Conexiones with 2370365 rows

CREATE TABLE `conexiones` (
  `idConexion` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `idInstalacion` int(10) unsigned DEFAULT NULL,
  `idUsuario` int(11) DEFAULT NULL,
  `tMacAdres` varchar(64) DEFAULT NULL,
  `tUsuario` varchar(128) DEFAULT NULL,
  `tNombre` varchar(64) DEFAULT NULL,
  `tApellido` varchar(64) DEFAULT NULL,
  `tEmail` varchar(64) DEFAULT NULL,
  `tSexo` varchar(20) DEFAULT NULL,
  `fNacimiento` date DEFAULT NULL,
  `nAmigos` int(11) DEFAULT NULL,
  `tPoblacion` varchar(64) DEFAULT NULL,
  `fFecha` datetime DEFAULT NULL,
  `tEvento` varchar(20) DEFAULT NULL,
  PRIMARY KEY (`idConexion`),
  KEY `idInstalacion` (`idInstalacion`),
  KEY `tMacAdress` (`tMacAdres`) USING BTREE,
  KEY `fFecha` (`fFecha`),
  KEY `idUsuario` (`idUsuario`),
  KEY `insta_fecha` (`idInstalacion`,`fFecha`)
) ENGINE=InnoDB AUTO_INCREMENT=2370365 DEFAULT CHARSET=utf8;

This is EXPLAIN

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  instalaciones   const   PRIMARY,idInstalacion   PRIMARY 4   const   1   
1   SIMPLE  conexiones  ref idInstalacion,fFecha,insta_fecha    idInstalacion   5   const   110234  "Using where"

Thanks !

(Edited)

SHOW TABLE STATUS LIKE 'conexiones'

  Name  Engine  Version Row_format  Rows    Avg_row_length  Data_length Max_data_length Index_length    Data_free   Auto_increment  Create_time Update_time Check_time  Collation   Checksum    Create_options  Comment
 conexiones InnoDB  10  Compact 2305296 151 350060544   0   331661312   75497472    2433305 28/06/2016 22:26    NULL    NULL    utf8_general_ci NULL        

Solution

  • Here's why it is so slow. And I will end with a possible speedup.

    First, please do

    SELECT COUNT(*) FROM conexiones
        WHERE idInstalacion=190
          and fFecha >= '2014-01-01'
          and fFecha  < '2016-06-19
    

    in order to see how many rows we are dealing with. The EXPLAIN suggests 110234, but that is only a crude estimate.

    Assuming there are 110K rows of conexiones involved in the query, and assuming the rows were (approximately) inserted in chronological order by fFecha, then...

    • There are a lot of rows to work with, and
    • They are scattered around the table on disk, hence
    • The query takes a lot of I/O, unless it is cached.

    Let's further check on my last claim... How much RAM do you have? What is the value of innodb_buffer_pool_size? It should be about 70% of available RAM. Use a lower percentage if you have less than 4GB of RAM.

    Assuming that conexiones is too big to be 'cached' in the 'buffer_pool', we need to find a way to decrease the I/O.

    There are 1332 different values for idInstalacion. Perhaps you insert 1332 rows every few minutes/hours into conexiones? Since the PRIMARY KEY merely an AUTO_INCREMENT, those rows will be 'appended' to the end of the table.

    Now let's look at where the idInstalacion=190 rows are. A new one of them occurs every 1332 (or so) rows. That means they are spread out. It means that (probably) no two rows are in the same block (16KB in InnoDB). That means that the 110234 will be in 110234 different blocks. That's about 2GB. If the buffer_pool is smaller than that, then there will be I/O. Even if it is bigger than that, that's a lot of data to touch.

    But what to do about it? If we could arrange the =190 rows to be consecutive in the table, then the 2GB might drop to, say, 20MB -- a much more manageable and cacheable size. But how can that be done? By changing the PRIMARY KEY.

    PRIMARY KEY(idInstalacion, fFecha, idConexion),
    INDEX(idConexion)
    

    and DROP any other indexes starting with idInstalacion or idConexion. To explain:

    • Since the PK is "clustered" with the data, all idInstalacion=190 rows over any consecutive fFetcha range will be consecutive in the data. So, fetching one block will get about 100 rows -- much less I/O.
    • A PK must be unique. Assuming (idInstalacion, fFecha) is not unique, I tacked on idConexion to make it unique.
    • I added INDEX(idConexion) to make AUTO_INCREMENT happy.

    Potential drawback... Since this change rearranges the order of the data, other queries, including the INSERTs may be slowed down. The INSERTs will be scattered, but not really slowed down. 1332 "hots spots" would be accepting the new rows; that many blocks can easily be cached.

    Arithmetic... If you have spinning drives, I would expect the existing structure to take about 1102 seconds (perhaps under 110 seconds for SSD) for 110234 rows. Since it is taking under 20 seconds, I suspect there is some caching (or you have SSDs) or the 110234 is grossly overestimated. My suggested change should decrease the "worst" time significantly, and slightly improve the "in memory" time. This "slight improvement" comes from being able to use the PK instead of a secondary key.

    Caveat: Since 110234 * 1332 is nowhere near 2370365, much of my numerical analysis is probably nowhere near correct. For example, 2370365 rows with that schema is possible less than 1GB. Please provide SHOW TABLE STATUS LIKE 'conexiones'.

    Addenda

    "server has 2GB Ram and innodb_buffer_pool_size is 5368709120" -- Either that is a typo or it is terrible. Since the buffer_pool needs to reside in RAM, do not set the buffer_pool to 5GB. 500MB might be OK for your tiny 2GB of RAM.

    The SHOW TABLE STATUS confirms that it (data + indexes) won't quite fit in 500M, so you may periodically experience I/O bound queries with 500M.

    Increasing your RAM and buffer_pool would temporarily (until the data gets bigger) help performance.

    Before putting this into production, test the ALTER and time the various queries you use:

    ALTER TABLE conexiones
        DROP PRIMARY KEY,
        DROP INDEX insta_fecha,
        DROP INDEX idInstalacion,
        PRIMARY KEY(idInstalacion, fFecha, idConexion),
        INDEX(idConexion)
    

    Caution: The ALTER will need about 1GB of free disk space.

    When timing, run with the Query Cache off, and run twice -- the first may involve I/O; the second is the 'in memory' as you mentioned.

    Revised analysis: Since the bigger table has 300MB of data and some amount of indexes in use, and assuming 500MB buffer pool, I suspect that blocks are bumped out of the buffer pool some of the time. This fits well with your initial comment on the query's speed. My suggested index changes should help avoid the speed variance, but may hurt the performance of other queries.