Search code examples
arrayspython-3.xnumpydata-conversionzarr

How to convert numpy array to a Zarr array


Suppose I have a converted a simple to column dataframe to a numpy array:

gdf.head()
>>>

     rid    rast
0      1    01000001000761C3ECF420013F0761C3ECF42001BF7172...
1      2    01000001000761C3ECF420013F0761C3ECF42001BF64BF...
2      3    01000001000761C3ECF420013F0761C3ECF42001BF560C...
3      4    01000001000761C3ECF420013F0761C3ECF42001BF7F25...
4      5    01000001000761C3ECF420013F0761C3ECF42001BF7172...

raster_np = gdf.to_numpy()
raster_np[0][0]
>>> array([1, '01000001000761C3E.........], dtype=object))   

I've been tasked with converting the numpy array to a Zarr file format (because of the size of the rast values and the size of the dataframe, chunking and compression might be necessary and the new .zarr files could be utilized better on an S3/cloud storage environment, I assume). I created a simple Zarr array like so:

 z_test = z.zeros(shape=(10000, 2), chunks=(10000, 2))
 z_test
 >>> <zarr.core.Array (10000, 2) float64>

Now, how do I get the data in raster_np into z_test and retain the Zarr attributes? Simply using z_test = raster_np obviously doesn't work. Perhaps there is something I am misunderstanding about Zarr. Any suggestions?


Solution

  • z_test = zarr.array(raster_np)
    

    See https://zarr.readthedocs.io/en/stable/api/creation.html#zarr.creation.array
    and https://zarr.readthedocs.io/en/stable/api/hierarchy.html#zarr.hierarchy.Group.array