Search code examples
google-cloud-sqlgoogle-compute-engine

Slow importing into Google Cloud SQL


Google Cloud SQL is my first real evaluation at MySQL as a service. I created a D32 instance, set replication to async, and disabled binary logging. Importing 5.5 GB from dump files from a GCE n1-standard-1 instance in the same zone took 97 minutes.

Following the documentation, the connection was done using the public IP address, but is in the same region and zone. I'm fully open to the fact that I did something incorrectly. Is there anything immediately obvious that I should be doing differently?


Solution

  • we have been importing ~30Gb via cloud storage from zip files containing SQL statements and this is taking over 24Hours. A big factor is the number of indexes that you have on the given table.

    To keep it manageable, we split the file into chunks with each 200K sql statements which are being inserted in one transaction. This enables us to retry individual chunks in case of errors.

    We also tried to do it via compute engine (mysql command line) and in our experience this was even slower.

    Here is how to import 1 chunk and wait for it to complete. You cannot do this in parallel as cloudSql only allows for 1 import operation at a time.

    #!/bin/bash
    
    function refreshAccessToken() {
        echo "getting access token..."
        ACCESSTOKEN=`curl -s "http://metadata/computeMetadata/v1/instance/service-accounts/default/token" -H "X-Google-Metadata-Request: True" | jq ".access_token" | sed 's/"//g'`
        echo "retrieved access token $ACCESSTOKEN"
    }
    
    START=`date +%s%N`
    
    DB_INSTANCE=$1
    GCS_FILE=$2
    SLEEP_SECONDS=$3
    
    refreshAccessToken
    
    CURL_URL="https://www.googleapis.com/sql/v1beta1/projects/myproject/instances/$DB_INSTANCE/import"
    CURL_OPTIONS="-s --header 'Content-Type: application/json' --header 'Authorization: OAuth $ACCESSTOKEN' --header 'x-goog-project-id:myprojectId' --header 'x-goog-api-version:1'"
    CURL_PAYLOAD="--data '{ \"importContext\": {  \"database\": \"mydbname\", \"kind\": \"sql#importContext\", \"uri\": [ \"$GCS_FILE\" ]}}'"
    CURL_COMMAND="curl --request POST $CURL_URL $CURL_OPTIONS $CURL_PAYLOAD"
    
    echo "executing $CURL_COMMAND"
    
    CURL_RESPONSE=`eval $CURL_COMMAND`
    echo "$CURL_RESPONSE"
    OPERATION=`echo $CURL_RESPONSE | jq ".operation" | sed 's/"//g'`
    
    echo "Import operation $OPERATION started..."
    CURL_URL="https://www.googleapis.com/sql/v1beta1/projects/myproject/instances/$DB_INSTANCE/operations/$OPERATION"
    STATE="RUNNING"
    while [[ $STATE == "RUNNING" ]]
    do
            echo "waiting for $SLEEP_SECONDS seconds for the import to finish..."
            sleep $SLEEP_SECONDS
            refreshAccessToken
            CURL_OPTIONS="-s --header 'Content-Type: application/json' --header 'Authorization: OAuth $ACCESSTOKEN' --header 'x-goog-project-id:myprojectId' --header 'x-goog-api-version:1'"
            CURL_COMMAND="curl --request GET $CURL_URL $CURL_OPTIONS"
            CURL_RESPONSE=`eval $CURL_COMMAND`
            STATE=`echo $CURL_RESPONSE | jq ".state" | sed 's/"//g'`
            END=`date +%s%N`
            ELAPSED=`echo "scale=8; ($END - $START) / 1000000000" | bc`
            echo "Import process $OPERATION for $GCS_FILE : $STATE, elapsed time $ELAPSED"
    done