Search code examples
google-cloud-platformgoogle-cloud-storagegoogle-cloud-sqlparquet

Export GCP Cloud SQL PostgreSQL to GCS in Parquet Format


I have data in GCP Cloud SQL PostgreSQL, I want to export this data into GCS in Parquet format, I see that it's not possible directly, only I can do in SQL and CSV format, anyways for exporting the data in Parquet format.


Solution

  • I propose you to achieve this through BigQuery

    1. Connect BigQuery to Cloud SQL postgres instance with Federated queries feature
    2. Create a new table in BigQuery with the Cloud SQL postgres data
    CREATE TABLE my_dataset.temp_table AS
    SELECT * FROM EXTERNAL_QUERY("project_id.region.connection_name", "SELECT * FROM .....;");
    
    1. Use the BigQuery export table feature with parquet option (which is in preview stage)
    bq --location=region extract \
    --destination_format parquet \
    project_id:my_dataset.temp_table \
    gs://bucket/filename.parquet
    
    1. Delete the temporary table in BigQuery

    This way guaranty you only one file (or a small number of sharded file.

    However you can speed up the process, but this solution can generate an higher number of file

    1. Connect BigQuery to Cloud SQL postgres instance with Federated table feature
    2. Export the data to GCS in only one query
    EXPORT DATA OPTIONS(
      uri='gs://bucket/filename.parquet*',
      format='PARQUET') AS
    SELECT * FROM EXTERNAL_QUERY("project_id.region.connection_name", "SELECT * FROM .....;");
    

    https://cloud.google.com/bigquery/docs/reference/standard-sql/other-statements#export_data_statement