Search code examples
google-cloud-sqlgoogle-compute-engine

Debugging IO limitations


I am trying to understand IO wait with google cloud SQL under load. I use the D1 plan which is supposed to allow 100 concurrent queries. I don't see any stats about the concurrent queries in my dashboard, but locally I have a 64 thread pool so I can never exceed 64 concurrent queries (well 2 more from time to time). Additionally my IO ops per second peak at 20.

Moving to the a bigger plan helps but only slightly.

And the queries are quite simple - select 1, select small order by random dominantly.

All my 64 threads are stuck on google SQL IO at the socketRead() like this:

at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at com.mysql.jdbc.util.ReadAheadInputStream.fill(ReadAheadInputStream.java:114)
at com.mysql.jdbc.util.ReadAheadInputStream.readFromUnderlyingStreamIfNecessary(ReadAheadInputStream.java:161)
at com.mysql.jdbc.util.ReadAheadInputStream.read(ReadAheadInputStream.java:189)
- locked <5f74f841> (a com.mysql.jdbc.util.ReadAheadInputStream)
at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3049)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3503)
at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3492)
at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4043)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2503)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2664)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2809)
- locked <7856a45c> (a com.mysql.jdbc.JDBC4Connection)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2758)
at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:894)
- locked <7856a45c> (a com.mysql.jdbc.JDBC4Connection)
at com.mysql.jdbc.StatementImpl.execute(StatementImpl.java:732)
at org.jboss.jca.adapters.jdbc.CheckValidConnectionSQL.isValidConnection(CheckValidConnectionSQL.java:74)
at org.jboss.jca.adapters.jdbc.BaseWrapperManagedConnectionFactory.isValidConnection(BaseWrapperManagedConnectionFactory.java:1173)
at org.jboss.jca.adapters.jdbc.BaseWrapperManagedConnection.checkValid(BaseWrapperManagedConnection.java:461)
at org.jboss.jca.adapters.jdbc.BaseWrapperManagedConnectionFactory.getInvalidConnections(BaseWrapperManagedConnectionFactory.java:999)
at org.jboss.jca.core.connectionmanager.pool.mcp.SemaphoreArrayListManagedConnectionPool.validateConnections(SemaphoreArrayListManagedConnectionPool.java:870)
at org.jboss.jca.core.connectionmanager.pool.validator.ConnectionValidator$ConnectionValidatorRunner.run(ConnectionValidator.java:277)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)

Can you advise how to understand the bottleneck?

The IO graph

EDIT: Also, this is the timeout exception which comes eventually after being stuck:

The last packet successfully received from the server was 13,969 milliseconds ago.  The last packet sent successfully to the server was 13,969 milliseconds ago.
        at sun.reflect.GeneratedConstructorAccessor107.newInstance(Unknown Source) [:1.6.0_45]
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) [rt.jar:1.6.0_45]
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513) [rt.jar:1.6.0_45]
        at com.mysql.jdbc.Util.handleNewInstance(Util.java:411)
        at com.mysql.jdbc.SQLError.createCommunicationsException(SQLError.java:1121)
        at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3603)
        at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3492)
        at com.mysql.jdbc.MysqlIO.checkErrorPacket(MysqlIO.java:4043)
        at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2503)
        at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2664)
        at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2815)
        at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:2155)
        at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:2322)
        at org.jboss.jca.adapters.jdbc.CachedPreparedStatement.executeQuery(CachedPreparedStatement.java:107)
        at org.jboss.jca.adapters.jdbc.WrappedPreparedStatement.executeQuery(WrappedPreparedStatement.java:462)
        at sun.reflect.GeneratedMethodAccessor77.invoke(Unknown Source) [:1.6.0_45]
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) [rt.jar:1.6.0_45]
        at java.lang.reflect.Method.invoke(Method.java:597) [rt.jar:1.6.0_45]
        at org.hibernate.engine.jdbc.internal.proxy.AbstractStatementProxyHandler.continueInvocation(AbstractStatementProxyHandler.java:122)
        ... 49 more
Caused by: java.io.EOFException: Can not read response from server. Expected to read 4 bytes, read 0 bytes before connection was unexpectedly lost.
        at com.mysql.jdbc.MysqlIO.readFully(MysqlIO.java:3052)
        at com.mysql.jdbc.MysqlIO.reuseAndReadPacket(MysqlIO.java:3503)
        ... 62 more

EDIT2: The consumer is a google cloud computing instance, collocated with the DB.


Solution

  • The Using filesort indicates that a sorting needs to take place. If the amount of data is bigger than sort_buffer_size then a temporary file will be created. The temporary files are slow in Cloud SQL so avoiding them is paramount for getting good performance/

    The default value for sort_buffer_size is 2M. It's a both a global and session variable so it can be change in the queries that requires more memory.

    mysql> SHOW VARIABLES LIKE 'sort_buffer_size';
    +------------------+---------+
    | Variable_name    | Value   |
    +------------------+---------+
    | sort_buffer_size | 2097152 |
    +------------------+---------+
    1 row in set (0.00 sec)
    
    mysql> 
    

    Each time the sort_buffer_size is too small the Created_tmp_files status variable will be incremented.

    mysql> SHOW GLOBAL STATUS LIKE 'Created_tmp_files';
    +-------------------+-------+
    | Variable_name     | Value |
    +-------------------+-------+
    | Created_tmp_files | 663   |
    +-------------------+-------+
    1 row in set (0.01 sec)
    
    mysql> 
    

    More information on Using filesort can be found in Baron Schwartz's post on MySQL Performance Blog. A deep discussion of 'ORDER BY' can be found in Sergey Petrunia’s blog post.

    I wrote a small program (mysql-sorttest) to specifically test the sort performance of MySQL.

    In the default config (1000 rows) the data fits in the memory (the Created_tmp_files is not incremented after the test) so queries are fast:

    $ ./sort -dsn="root:root@tcp(x.x.x.x:3306)/sorttest" prepare
    2014/08/24 12:22:03 Start number of rows: 0
    2014/08/24 12:22:06 End number of rows: 1000
    $ ./sort -dsn="root:root@tcp(x.x.x.x:3306)/sorttest" run
    2014/08/24 12:22:10 Sleep 10s
    2014/08/24 12:22:10 Start oneRun
    2014/08/24 12:22:20 Done 3000 requests
    2014/08/24 12:22:20 Total requests: 300.00/s
    $
    

    For 5000 rows the data doesn't fit in memory and the speed drops substantially:

    $ ./sort -dsn="root:root@tcp(x.x.x.x:3306)/sorttest" -table_size=5000 prepare
    2014/08/24 12:22:53 Start number of rows: 1000
    2014/08/24 12:23:08 End number of rows: 5000
    $ ./sort -dsn="root:root@tcp(x.x.x.x:3306)/sorttest" run
    2014/08/24 12:23:15 Sleep 10s
    2014/08/24 12:23:15 Start oneRun
    2014/08/24 12:23:25 Done 91 requests
    2014/08/24 12:23:25 Total requests: 9.10/s
    $
    

    A comment on ORDER BY RAND(): as indicated in numerous places it's good to avoid that type of query because it forces MySQL to generate random for every row and then sort that. That is a CPU intensive operation and doing that with enough threads will eventually consume all the CPU (or perhaps hit some other limits in mysqld).