Search code examples
uploadgeolocationsnowflake-cloud-data-platformgeopandas

How to upload Polygons from GeoPandas to Snowflake?


I have a geometry column of a geodataframe populated with polygons and I need to upload these to Snowflake.

I have been exporting the geometry column of the geodataframe to file and have tried both CSV and GeoJSON formats, but so far I either always get an error the staging table always winds up empty.

Here's my code:

design_gdf['geometry'].to_csv('polygons.csv', index=False, header=False, sep='|', compression=None)

import sqlalchemy
from sqlalchemy import create_engine
from snowflake.sqlalchemy import URL

engine = create_engine(
    URL(<Snowflake Credentials Here>)
)

with engine.connect() as con:
    con.execute("PUT file://<path to polygons.csv> @~ AUTO_COMPRESS=FALSE")

Then on Snowflake I run

create or replace table DB.SCHEMA.DESIGN_POLYGONS_STAGING (geometry GEOGRAPHY);

copy into DB.SCHEMA."DESIGN_POLYGONS_STAGING"
from @~/polygons.csv
FILE_FORMAT = (TYPE = CSV FIELD_DELIMITER = '|' SKIP_HEADER = 1 compression = None encoding = 'iso-8859-1');

Generates the following error:

"Number of columns in file (6) does not match that of the corresponding table (1), use file format option error_on_column_count_mismatch=false to ignore this error File '@~/polygons.csv.gz', line 3, character 1 Row 1 starts at line 2, column "DESIGN_POLYGONS_STAGING"[6] If you would like to continue loading when an error is encountered, use other values such as 'SKIP_FILE' or 'CONTINUE' for the ON_ERROR option. For more information on loading options, please run 'info loading_data' in a SQL client."

Can anyone identify what I'm doing wrong?


Solution

  • Inspired by @Simeon_Pilgrim's comment I went back to Snowflake's documentation. There I found an example of converting a string literal to a GEOGRAPHY.

    https://docs.snowflake.com/en/sql-reference/functions/to_geography.html#examples

    select to_geography('POINT(-122.35 37.55)');
    

    My polygons looked like strings describing Polygons more than actual GEOGRAPHYs so I decided I needed to be treating them as strings and then calling TO_GEOGRAPHY() on them.

    I quickly discovered that they needed to be explicitly enclosed in single quotes and copied into a VARCHAR column in the staging table. This was accomplished by modifying the CSV export code:

    import csv
    design_gdf['geometry'].to_csv(<path to polygons.csv>, 
                              index=False, header=False, sep='|', compression=None, quoting=csv.QUOTE_ALL, quotechar="'")
    

    The staging table now looks like:

    create or replace table DB.SCHEMA."DESIGN_POLYGONS_STAGING" (geometry VARCHAR);
    

    I ran into further problems copying into the staging table related to the presence of a polygons.csv.gz file I must have uploaded in a previous experiment. I deleted this file using:

    remove @~/polygons.csv.gz
    

    Finally, converting the staging table to GEOGRAPHY

    create or replace table DB.SCHEMA."DESIGN_GEOGRAPHY_STAGING" (geometry GEOGRAPHY);
    
    insert into DB.SCHEMA."DESIGN_GEOGRAPHY"
    select to_geography(geometry)
    from DB.SCHEMA."DESIGN_POLYGONS_STAGING"
    

    and I wound up with a DESIGN_GEOGRAPHY table with a single column of GEOGRAPHYs in it. Success!!!