Search code examples
pythonarraysgisgeopandasgeopackage

Can I save a GeoDataFrame that contains an array to a GeoPackage file?


I have a geopandas GeoDataFrame with some attribute columns and a geometry column (just a regular GDF). Usually I save GDF's as GeoPackage giles (.gpkg) using:

gdf.to_file('path_to_file.gpkg', driver='GPKG')

This works fine, unless my GDF has a column where the entries are arrays. So say I have two columns next to the geometry column and one of them contains a numpy array for each entry. If I then try to save as a gpkg it gives me the error:

ValueError: Invalid field type <class 'numpy.ndarray'>

So it appears that a gpkg cannot handle arrays in the table. The arrays I want to include are simple flags (so values of 0 and 1). I found two workarounds which work alright but are a bit messy:

  1. Make a string of the array values. This works but I would very much prefer leaving it as an array...
  2. Create a separate column for every array value. This would also work but then I get a GDF with a lot of columns and I feel there should be a better way to do this.

Does anybody know of a better workaround to this issue?


Solution

  • I believe this is just a limitation of the .gpkg format. However, I think the best workaround approach is to store the arrays as strings, like you suggested. You can easily convert them back into arrays in news gdf if you need to with ast literal_eval().

    import pandas as pd
    import numpy as np
    import geopandas as gpd
    from shapely.geometry import LineString, Point
    from ast import literal_eval
    
    gdf = gpd.GeoDataFrame({'id': [1, 2, 3], 'array_col': [np.array([0,1,2]), np.array([0,1,2]), np.array([0,1,2])]},
                           geometry=[LineString([(1, 1), (4, 4)]),
                                     LineString([(1, 4), (4, 1)]),
                                    LineString([(6, 1), (6, 6)])])
    
    gdf['array_col'] = gdf['array_col'].apply(lambda x: str(x))
    
    gdf.to_file('path_to_file.gpkg', driver='GPKG')
    
    gpkg = gpd.read_file('path_to_file.gpkg')
    
    gpkg['array_col'] = gpkg['array_col'].apply(lambda x: np.array(literal_eval(x.replace(' ', ','))))
    

    After this, we can access our np arrays again.

    print(gpkg['array_col'][0])
    
    array([0, 1, 2])