Search code examples
pythonnumpyrastergdal

Raster creation from numpy array, taking values from a csv file


I have a geotiff. I want to replace the values in the raster with corresponding values from a csv table.

The raster has class values 0 to n, and the csv has a calculated value (eg point density) for each class n of the raster. I want to create a new raster from the corresponding values in the csv

I am using using GDAL and numpy. I tried with pandas, but got stuck up with the issue of extracting values from the csv to the raster pandas dataframe. I will be performing this on a list of rasters for their corresponding csv tables.

Below the example of my data (one raster)

#Example raster array
[5 2 2 3
 0 3 1 4
 2 0 1 3]

#Corresponding csv table
  Class   Count  Density
    0       2       6
    1       2       9
    2       2       4
    3       3       9
    4       1       7
    5       1       2


#Output Raster (to take the corresponding density values, 
#i.e. if class = 0, then output raster = 6, the corresponding density value)
    [2 4 4 9
     6 9 9 7
     4 6 9 9]

I have the code for creation of array from raster and writing the raster back from array. I discovered it from various stackexchange sites. I do not know how to frame the loop to get the values from the csv in the new raster. My code of the 'for loop' below is incomplete. Could anyone please help

import numpy, sys
from osgeo import gdal
from osgeo.gdalconst import *

inRst = gdal.Open(r"c:/Raster1.tif")
band = inRst.GetRasterBand(1)
rows = inRst.RasterYSize
cols = inRst.RasterXSize
rstr_arry = band.ReadAsArray(0,0,cols,rows)

# create the output image
driver = inRst.GetDriver()
#print driver
outRst = driver.Create(r"c:/NewRstr.tif", cols, rows, 1, GDT_Int32)
outBand = outRst.GetRasterBand(1)
outData = numpy.zeros((rows,cols), numpy.int32)

for i in range(0, rows):
    for j in range(0, cols):
        if rstr_arry[i,j] =  :
            outData[i,j] = 
        elif rstr_arry[i,j] = :
            outData[i,j] = 
        else:
            outData[i,j] = 


# write the data
outRst= outBand.WriteArray(outData, 0, 0)
# flush data to disk, set the NoData value and calculate stats
outBand.FlushCache()
outBand.SetNoDataValue(-99)
# georeference the image and set the projection
outDs.SetGeoTransform(inDs.GetGeoTransform())
outDs.SetProjection(inDs.GetProjection())

Solution

  • If I'm not mistaken on what you want to achieve, you first have to read your csv file and create a mapping of the Class values to the Density values. This can be done like this :

    import csv
    
    mapping = {}
    
    with open('test.csv') as csv_file:
        csv_reader = csv.DictReader(csv_file)
        for row in csv_reader:
            mapping[int(row['Class'])] = int(row['Density'])
    

    You will obtain a dict like this :

    {0: 6, 1: 9, 2: 4, 3: 9, 4: 7, 5: 2}
    

    Then you can use np.in1d to create a mask matrix of what need to be replaced, and np.searchsorted to replace the elements. You will need to flatten your raster array before doing so, and to restore it's shape before writing back the result. (alternatives to replace elements in a numpy array can be found in answers to this question : Fast replacement of values in a numpy array)

    # Save the shape of the raster array
    s = rstr_arry.shape
    # Flatten the raster array
    rstr_arry = rstr_arry.reshape(-1)
    # Create 2D replacement matrix:
    replace = numpy.array([list(mapping.keys()), list(mapping.values())])
    # Find elements that need replacement:
    mask = numpy.in1d(rstr_arry, replace[0, :])
    # Replace them:
    rstr_arry[mask] = replace[1, numpy.searchsorted(replace[0, :], rstr_arry[mask])]
    # Restore the shape of the raster array
    rstr_arry = rstr_arry.reshape(s)
    

    You can then write your data almost as you planned to do :

    outBand.WriteArray(rstr_arry, 0, 0)
    outBand.SetNoDataValue(-99)
    
    outDs.SetGeoTransform(inRst.GetGeoTransform())
    outDs.SetProjection(inRst.GetProjection())
    
    outBand.FlushCache()
    

    Testing it on your example data :

    rstr_arry = np.asarray([
        [5, 2, 2, 3],
        [0, 3, 1, 4],
        [2, 0, 1, 3]])
    
    mapping = {0: 6, 1: 9, 2: 4, 3: 9, 4: 7, 5: 2}
    
    s = rstr_arry.shape
    rstr_arry = rstr_arry.reshape(-1)
    replace = numpy.array([list(mapping.keys()), list(mapping.values())])
    mask = numpy.in1d(rstr_arry, replace[0, :])
    rstr_arry[mask] = replace[1, numpy.searchsorted(replace[0, :], rstr_arry[mask])]
    rstr_arry = rstr_arry.reshape(s)
    
    print(rstr_arry)
    # [[2 4 4 9]
    #  [6 9 9 7]
    #  [4 6 9 9]]