Search code examples
pythonnumpygdal

Write data to GeoTiff using GDAL without creating data array?


Is it possible to write data line by line using gdal's WriteArray, rather than creating and feeding it an entire array?

I've run into a MemoryError when creating an numpy array of size (50539,98357). I suppose I could get around this by using PyTables, but I'd rather not complicate things

>>> import numpy
>>> cols = 50539
>>> rows = 98357
>>> a1 = np.zeros((cols,rows))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'np' is not defined
>>> import numpy as np
>>> a1 = np.zeros((cols,rows))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
MemoryError

Update: I ended up using Abudis's solution combined with a sparse matrix in which I saved the points, pulling out each row as a "dense" or standard matrix. Comments welcome.

dataset = driver.Create(filename, cols, rows, number_of_bands, band_type)
band = dataset.GetRasterBand(1)
offset = 1 # i.e. the number of rows to write with each iteration

# values, pixel_x, and pixel_y defined earlier in my script
data = scipy.sparse.csr_matrix((values,(pixel_y,pixel_x)),dtype=float)

# iterate data writing for each row in data sparse array
for i in range(data.shape[0]):
    data_row = data[i,:].todense() # output row of sparse array as standard array
    band.WriteArray(data_row,0,offset*i)
band.SetNoDataValue(NULL_VALUE)
band.FlushCache()

Solution

  • Not really a gdal expert here, but looks like this worked. So, the trick is to use yoff parameter of WriteArray method. This way it is possible to write data to file in chunks. We basically just set the offset while writing next chunk of data.

    import numpy as np
    import gdal
    
    cols = 50539
    rows = 10000
    offset = 1000
    
    dst_filename = 'test.tif'
    format = 'GTiff'
    driver = gdal.GetDriverByName(format)
    
    dst_ds = driver.Create(dst_filename, cols, rows, 1, gdal.GDT_Byte)
    
    for i in range(10):
        # generate random integers from 1 to 10
        a = np.random.random_integers(1, 10, size=(offset, cols))
        # write data to band 1
        dst_ds.GetRasterBand(1).WriteArray(a, 0, offset * i)
    
    dst_ds = None
    

    This yields a 50539 by 10000 pixels tiff-file. If you really need 98357 rows, I guess you can just set offset value to 1 and rows to 98357. Theoretically it should work, because it worked on smaller array (10 by 20 pixels).

    EDIT: Don't forget to change range(10) to something like range(98357) or xrange(98357)