Search code examples
pythonarcgisraster

Extract points within a shape from a raster


I have a raster file (basically 2D array) with close to a million points. I am trying to extract a circle from the raster (and all the points that lie within the circle). Using ArcGIS is exceedingly slow for this. Can anyone suggest any image processing library that is both easy to learn and powerful and quick enough for something like this?

Thanks!


Solution

  • Extracting a subset of points efficiently depends on the exact format you are using. Assuming you store your raster as a numpy array of integers, you can extract points like this:

    from numpy import *
    
    def points_in_circle(circle, arr):
        "A generator to return all points whose indices are within given circle."
        i0,j0,r = circle
        def intceil(x):
            return int(ceil(x))
        for i in xrange(intceil(i0-r),intceil(i0+r)):
            ri = sqrt(r**2-(i-i0)**2)
            for j in xrange(intceil(j0-ri),intceil(j0+ri)):
                yield arr[i][j]
    

    points_in_circle will create a generator returning all points. Please note that I used yield instead of return. This function does not actually return point values, but describes how to find all of them. It creates a sequential iterator over values of points within circle. See Python documentation for more details how yield works.

    I used the fact that for circle we can explicitly loop only over inner points. For more complex shapes you may loop over the points of the extent of a shape, and then check if a point belongs to it. The trick is not to check every point, only a narrow subset of them.

    Now an example of how to use points_in_circle:

    # raster dimensions, 10 million points
    N, M = 3200, 3200
    # circle center and its radius in index space
    i0, j0, r = 70, 20, 12.3
    
    raster = fromfunction(lambda i,j: 100+10*i+j, (N, M), dtype=int)
    print "raster is ready"
    print raster
    
    pts_iterator = points_in_circle((i0,j0,r), raster) # very quick, do not extract points yet
    pts = array(list(pts_iterator)) # actually extract all points
    print pts.size, "points extracted, sum = ", sum(pts)
    

    On a raster of 10 million integers it is pretty quick.

    Please describe file format or put a sample somewhere if you need more specific answer.