Create numPy array from raster image

I'm trying to turn a 4band (RGB & nr Infrared) raster image into a numPy array in ArcMap. Once successfully converted to an numpy array I want to count the number of pixels that have no data on the image. When inspected in ArcMap these pixel(s) colour is marked as "None", they appear black, but they are missing either red, green and/or blue channel data from band 1,2 or 3. I need to find them.

Here's what I have so far:

import numpy
import os

myDir = "C:\\Temp\\temp"
# myFile = "4_pixel_test.tif"
myFile = "4band.tif"

# import 4band (R,G,B & nr Infrared) image
fName = os.path.join(myDir, myFile)
head, tail = os.path.split(fName)


# Convert Raster to Array, Default using LowerLeft as Origin
rasArray = arcpy.RasterToNumPyArray(fName)

# find out the number of bands in the image
nbands = rasArray.shape[0] # int
# print nbands (int)

blackCount = 0 # count the black pixels per image
th = 0 # Threhold value

# print rasArray

r, g, b, a = rasArray # not working

rCheck = numpy.any(r <= th)
gCheck = numpy.any(g <= th)
bCheck = numpy.any(b <= th)
aCheck = numpy.any(a == 0)

print rCheck
print gCheck
print bCheck
print aCheck


# show the results
if rCheck:
  print ("Black pixel (red): %s" % (tail))

elif gCheck:
  print ("Black pixel (green): %s" % (tail))

elif bCheck:
  print ("Black pixel (blue): %s" % (tail))

else:
  print ("%s okay" % (tail))

if aCheck:
  print ("Transparent pixel: %s" % (tail))

Runtime error Traceback (most recent call last): File "", line 14, in File "c:\program files (x86)\arcgis\desktop10.2\arcpy\arcpy__init__.py", line 1814, in RasterToNumPyArray return _RasterToNumPyArray(*args, **kwargs) RuntimeError: ERROR 999998: Unexpected Error.

# previous code which might have incorrect numpy import
# options so I'm going with default options until I know better
# import numpy
# import os
# 
# myDir = "C:\\Temp\\temp"
# myFile = "4_pixel_test.tif"
# fName = os.path.join(myDir, myFile)
# 
# Convert Raster to Array
# rasArray = arcpy.RasterToNumPyArray(fName)
# maxVal = rasArray.max()
# minVal = rasArray.min()
# maxValpos = numpy.unravel_index(rasArray.argmax(),rasArray.shape) 
# minValpos = numpy.unravel_index(rasArray.argmin(),rasArray.shape)
# 
# desc = arcpy.Describe(fName)
# utmX = desc.extent.upperLeft.X + maxValpos[0]  
# utmY = desc.extent.upperLeft.Y - maxValpos[1]  
# 
# for pixel in numpy.nditer(rasArray):
#   # r,g,b = pixel # doesn't work  - single dimension array
#   print pixel
#

I was able to change the raster image into numPY array from the code here.

Not sure how the numPY array gets stored, but when iterating through it the data gets printed out starting with the y axis and working down (column by column) of the image instead of the x (row by row).

I need to switch this so I can read the data pixel by pixel (RGBA) from top left to bottom right. However, I don't know enough about numPy to do this.

I think the error in question maybe due to the size of the tiff in question: It works fine with 2.5MB but falls over at 4GB. :(

Solution

It seems like you're asking about np.nditer.

You don't want to use nditer unless you need low-level control. However, you'll almost never need that level of control. It's best not to use nditer unless you know exactly why you need it.

What you have is a 3D numpy array. You're currently iterating over every element in the array. Instead, you want to iterate over only the first two dimensions of the array (width and height).

Iterating through a 3D array

As a quick example to reproduce what you're seeing without ArcMap:

import numpy as np

data = np.random.random((3, 10, 10))

for value in np.nditer(data):
    print value

(Quick note: I'm using arcpy's shape convention of nbands x nrows x ncolumns here. It's also very common to see nrows x ncolumns x nbands. In that case, the indexing expressions in the later sections will be different)

Again, nditer is not what you want, so if you did want to do exactly this (every value in the array instead of every r,g,b pixel), it would be far more readable to do:

import numpy as np

data = np.random.random((3, 10, 10))

for value in data.flat:
    print value

The two are identical in this case.

Iterating through pixels

Moving on, though, you're wanting to iterate over every pixel. In that case, you'd do something like:

import numpy as np

data = np.random.random((3, 10, 10))

for pixel in data.reshape(3, -1).T:
    r, g, b = pixel
    print r, g, b

In this case, we've temporarily viewed the 10x10x3 array as a 100x3 array. Because numpy arrays iterate over the first axis by default, this will iterate over every r,g,b element.

If you'd prefer, you could also use indexing directly, though it will be a bit slower:

import numpy as np

data = np.random.random((3, 10, 10))

for i, j in np.ndindex(data.shape[:-2]):
    r, g, b = data[:, i, j]
    print r, g, b

Vectorize, don't iterate through a `numpy` array

In general, though, iterating through an array element-by-element like this is not an efficient way to use numpy.

You mentioned that you're trying to detect when bands have been eliminated and/or set to a constant value.

There are three things you might mean by that: 1) there's only a single band, 2) the data in some bands has been set to 0 (or another value), 3) the image is grayscale, but stored as RGB.

You can check the number of bands either by looking at the numpy array:

nbands = data.shape[0]

Or by using arcpy directly:

nbands = raster.bandCount

That handles the first case, however, it looks like you're trying to detect when bands have no information, as opposed to whether or not they're there.

If you always expect to have at least red, green, and blue (sometimes alpha, sometimes not), it's easiest to unpack the bands somewhat similar to:

r, g, b = data[:3, :, :]

That way, if there is an alpha band, we'll just ignore it, and if it's not there, it won't matter. Again, this assumes the shape of your data is nbands x nrows x ncolumns (and not nrows x ncolumns x nbands).

Next, if we want to check if all of the pixel values in a band are zero, don't iterate through. Instead use numpy boolean comparisons. They'll be much (>100x) faster:

r, g, b = data[:3, :, :]
print np.all(r == 0) # Are all red values zero?

However, I'd guess that what you most often want to detect is a grayscale image that's been stored as RGB. In that case, the red, green, blue values of each pixel will be equal, but the pixels won't all be the same. You can check for that by doing:

gray = (r == b) & (b == g)
print np.all(gray)

In general, you really don't want to iterate through each pixel in a numpy array. Use vectorized expressions instead.

Create numPy array from raster image

Iterating through a 3D array

Iterating through pixels

Vectorize, don't iterate through a numpy array

Vectorize, don't iterate through a `numpy` array