Let's say I have a 3d array in python that looks like this [[[3, 4, 9], [5, 3, 1], [6, 4, 2]], [[2, 3, 6], [7, 9, 10], [5, 12, 4]], [[7, 5, 1], [3, 1, 2], [6, 5, 2]]]. I want to extract the first element of each entry and put them all into a 1d array like this [3, 5, 6, 2, 7, 5, 7, 3, 6]. I'm working with an hsv image where each pixel has a 3 tuple where each entry corresponds to the hue, saturation, and value. I want to extract just the hue value of each pixel and put it into a 1d array. Here's what my code looks like.
import numpy as np
import colorsys
from skimage import color
from skimage.color import rgb2hsv
img = cv2.imread("input.jpg", 1)
img_hsv = color.rgb2hsv(img)
b = []
for i in img_hsv:
b.append(i[0][0])
The problem is that the image I'm reading in is 640x480 and the shape of b is just 640 which makes me think I haven't got all the pixels in the image. So my two questions are, is my for loop correct and do I even need a for loop to do this or does python have a library that can do this?
Slicing the array is by far the fastest method available. The array you are importing is 3-dimensional in nature, just as an example let's create a 3d array from random numbers from the domain 0-10:
import numpy as np
img = np.random.randint(0, 10, (5, 3, 3))
img.shape
Out[36]:
(5, 3, 3)
Out[37]:
array([[[8, 0, 8],
[9, 0, 5],
[9, 0, 4]],
[[5, 2, 5],
[3, 3, 1],
[3, 4, 0]],
[[1, 2, 2],
[9, 0, 6],
[2, 5, 9]],
[[8, 6, 2],
[4, 5, 1],
[3, 3, 6]],
[[8, 0, 7],
[0, 6, 0],
[5, 2, 3]]])
Now, you want to select the first value (in your case, the hue) via:
hue = img[:, :, 0]
hue
Out[43]:
array([[8, 9, 9],
[5, 3, 3],
[1, 9, 2],
[8, 4, 3],
[8, 0, 5]])
This will yield a 2d array, but you want a 1d: just flatten it
hue = hue.flatten()
hue
Out[44]: array([8, 9, 9, 5, 3, 3, 1, 9, 2, 8, 4, 3, 8, 0, 5])
Voila, a 1d array. Read how flatten works to understand how it sorts the output.
While slicing is the fastest option here, you ask how your for loop can be improved. The problem with your loop is you are only going over the rows. Since you are operating over the first two dimension of your 3d array you need to have two for loops (warning, this is very slow...some reading on "Big O notation" may be helful). The following changes to your loop will suffice
b = []
for row in img_hsv.shape[0]:
for col in img_hsv.shape[1]:
b.append(hsv[row, col, 0])
One more thing, you said you wanted your output (b) to be an array. You currently have it defined as a list, you can convert it to an arry via
b = np.array(b)
Just for fun, and to drive home the point of how slow looping is, I timed each option.
import datetime as dt
import numpy as np
img = np.random.randint(0, 100, (640, 480, 3))
iterations = 100
d1 = dt.datetime.now()
for i in range(iterations):
hue = img[:, :, 0]
print('Slicing total time: ', dt.datetime.now() - d1)
d1 = dt.datetime.now()
for i in range(iterations):
hue = []
for row in range(img.shape[0]):
for col in range(img.shape[1]):
hue.append(img[row, col, 0])
print('Multiple total looping time: ', dt.datetime.now() - d1)
Slicing total time: 0:00:00.002107
Multiple total looping time: 0:00:08.860522
Or 20 microseconds per file vs 80 ms per file (factor of 4200 times faster).