Search code examples
phpmysqlgoogle-app-enginegoogle-cloud-sql

App Engine Standard connection to Cloud SQL Latency Randomly


I have a pretty "basic" app that we designed that was originally on a local plesk server and we migrated to GAE/GSQL/GCS. app engine, mysql, cloud storage.

Here's some background info:

App is PHP based, and runs great on the local server. When we migrate to the cloud we notice this random yet extremely latency that happens. It's so bad that the app times out and gives a SPDY timeout error. We utilize cloudflare for SPDY assistance so we started there and they said it's the the server. Then we went to google. We've been going back and forth back and forth and I am looking for other avenues of help.

I am running an app on a F2 standard GAE instance and a G1-small CloudSQL instance (gen 2). All same region/zone. There is also a failover sql instance.

There is really no pattern to it but users on the app notice a bad timeout very frequently and it dies after 60 seconds. (which points to a PHP timeout right? We checked the code and it runs fine on the local server)

I dont have a whole lot of traffic on this app yet (maybe a few users a day) so i dont know if it's traffic load. Here's some basic stats for you:

https://i.sstatic.net/nQRQI.jpg

Some Google Engineers said our app has trouble scaling (QPS never will get about 1)

https://i.sstatic.net/GiZZS.jpg

And asked if we are threading. We are not. We do not use memcache yet either.

I also see a ton of these:

https://i.sstatic.net/6UucI.jpg

Which looks like this bug: https://github.com/GoogleCloudPlatform/cloudsql-proxy/issues/126

But I am unsure if this is all related.

We've tried going through Google's tech support, they said we have "manual locks" but our dev team doesn't agree nor know what this really means. Again, the same framework of the app (session handling etc) code is used in many apps with a ton of users on it (non GAE, they're on compute on AWS) so this is our first venture to GAE.

We connect using standard MySQL connection parameters and use the same framework in a lot of applications and it runs fine. We use the required proxy to connect to CloudSQL.

The speed and constant lag shouldn't be there. We don't know what this issue could be. My questions are:

1) Do you see any issues here? All database logs are above and summaries

2) Can you help me understand what may be wrong here?

Thank you!


Solution

  • There was a query we found running that caused a huge database lag.