Search code examples
mysqlpython-3.xgoogle-cloud-sqlpymysql

Inserting large number of data rows into cloudsql table via PyMySql from .csv


I am new to CloudSQL & I am trying to insert records of 40+ different columns and over 1.5 million rows. However I am unable to do this in Google CloudSQL. I have taken quite a number of measures listed below to resolve this issue but the main error that I get is:

ERROR:
textPayload:  "2019-04-12T06:10:47.348295Z 8554 [Note] Aborted connection 8554 to db: 'xxxxx_xxx' user: 'root' host: 'x.x.x.x' (Got an error reading communication packets)"  

Summary:

  1. I am using Python, PyMySql to insert 1.5 million rows of data into a table of 35 columns
  2. Instance, DB, Table have already been created in CloudSQL.
  3. System configuration: vCPUs - 4, Memory- 15 GB, SSD storage - 10 GB
  4. I can load this data completely fine in my local system.
  5. In Google CloudSQl, deployment time is extremely long and deployment is Successful.
  6. But when I check my table, it is empty.
  7. The MySql error logs in the instance shows the above.

I have made tried the following actions:

  • Using API URL/ .txt / .json file for upload instead now using .csv.

  • Thinking it is a system issue, I upgraded the system from 8GB Memory to 15 GB Memory.

  • Thinking that SQL default configs are causing limitations, I have added the following:

    • sql_mode : MAXDB,NO_AUTO_CREATE_USER

    • max_allowed_packet: 1073741824

    • net_read_timeout: 4294967295

    • wait_timeout: 31536000

  • Inserted lesser number of rows, max rows able to insert = 100

def adddata():
    try:
        conn = pymysql.connect(unix_socket='/cloudsql/' + 'karto-235001:asia-east1:karto', user='xxx', password='xxx', db='xxx')
        cur = conn.cursor()
        insert_ = "INSERT INTO data_table(a, b, c) VALUES (%s, %s, %s)"
        with open('info.csv', newline='') as myFile:
            reader = csv.reader(myFile)
            for item in reader:
                cur.execute(insert_, (item[3], item[4], item[5]))
            conn.commit()
            cur.close()
    finally:
        conn.close()

I have checked online and have implemented the recommended solutions by CloudSQL and other stack-overflow users. If anyone could identify what I am doing wrong or if there are issues with my code or configuration? Thank you very much.


Solution

  • I see that you want to upload information contained in a CSV file using Python. Have you tried importing directly to the database? You can follow the steps in the link[1].

    In the meantime I’ll try to replicate your case. You also may want to check if your installations and configuration are correct.

    Verify if Your Cloud SQL instance and connection [2] and you Python installation[3].

    [1]https://cloud.google.com/sql/docs/mysql/import-export/importing#csv
    [2]https://cloud.google.com/sql/docs/mysql/connect-compute-engine
    [3]https://cloud.google.com/python/setup