Search code examples
pythonpython-3.xgisrastergdal

Pick random pixels' centroids within raster area - Python+gdal


I have a raster file in WGS84 projection and I am trying to get the coordinates of random pixels within the raster GeoTIFF area down left in picture. At first, I calculate the coordinates of each pixel's centroid (in WGS84 again), then I pick 100 random of them and export them to a csv.

Problem: I expect points to be within the raster area (down left in picture) but they are way off of it. Is it a projection error or coordinates miscalculation? What is wrong in my code?

enter image description here

Here is the code

# Get coordinates for each pixel centroid
geotiff = gdal.Open(path)
gt = geotiff.GetGeoTransform()
column_numbers, row_numbers, band_numbers = geotiff.RasterXSize, geotiff.RasterYSize, geotiff.RasterCount
minx = gt[0]
miny = gt[3] + column_numbers*gt[4] + row_numbers*gt[5]
maxx = gt[0] + column_numbers*gt[1] + row_numbers*gt[2]
maxy = gt[3]
pixelWidth = gt[1]
pixelHeight = -gt[5]
lonPxSz = (maxy - miny) / row_numbers
latPxSz = (maxx - minx) / column_numbers
total = np.array(geotiff.ReadAsArray())
res = []
for i in range(row_numbers):
    for j in range(column_numbers):
        res.append([[i,j]] + [data[i][j] for data in total])
coords = pd.DataFrame(res, columns=['Pair', 'Col1', 'Col2', 'Col3', 'Col4', 'Col5', 'Col6'])
coords[['Lat', 'Lon']] = pd.DataFrame(coords['Pair'].tolist(), index=coords.index)
coords["Lat"] = (coords["Lat"] + 0.5) * 10 * latPxSz + miny
coords["Lon"] = (coords["Lon"] + 0.5) * 10 * lonPxSz + minx
coords = coords.sample(n = 100)
coords[['Lat', 'Lon']].to_csv("coords.csv", sep=";")

Solution

  • If you only want to pick 100 random points on the image:

    from osgeo import gdal
    import numpy as np
    import pandas as pd
    import random
    
    
    path = "image.tif"
    geotiff = gdal.Open(path)
    gt = geotiff.GetGeoTransform()
    column_numbers, row_numbers, band_numbers = geotiff.RasterXSize, geotiff.RasterYSize, geotiff.RasterCount
    
    minx = gt[0]
    miny = gt[3] + column_numbers * gt[4] + row_numbers * gt[5]
    maxx = gt[0] + column_numbers * gt[1] + row_numbers * gt[2]
    maxy = gt[3]
    
    pixelWidth = gt[1]
    pixelHeight = -gt[5]
    halfPixelWidth = pixelWidth / 2
    halfPixelHeight = pixelHeight / 2
    
    rand_point_x = random.sample([i for i in range(column_numbers)], 100)
    rand_point_y = random.sample([i for i in range(row_numbers)], 100)
    rand_points = np.vstack((rand_point_y, rand_point_x)).T
    
    coords = pd.DataFrame(rand_points, columns=['Lat', 'Lon'])
    coords["Lat"] = miny + (coords["Lat"] * pixelHeight) + halfPixelHeight
    coords["Lon"] = minx + (coords["Lon"] * pixelWidth) + halfPixelWidth
    
    coords.to_csv("coords.csv", sep=',')
    

    You may use the coordinates of these random points to retrieve pixel values afterward.