How to quickly iterate over and modify pixel arrays with numpy?

First off, i am relatively new to Python and its libraries.

The purpose of the following code is to convert a HDR image to RGBM as detailed in WebGL Insights Chapter 16.

import argparse
import numpy
import imageio
import math

# Parse arguments
parser = argparse.ArgumentParser(description = 'Convert a HDR image to a 32bit RGBM image.')
parser.add_argument('file', metavar = 'FILE', type = str, help ='Image file to convert')
args = parser.parse_args()

# Load image
image = imageio.imread(args.file)
height = image.shape[0]
width = image.shape[1]

output = numpy.zeros((height, width, 4))

# Convert image
for i in numpy.ndindex(image.shape[:2]):
    rgb = image[i]
    rgba = numpy.zeros(4)
    rgba[0:3] = (1.0 / 7.0) * numpy.sqrt(rgb)
    rgba[3] = max(max(rgba[0], rgba[1]), rgba[2])
    rgba[3] = numpy.clip(rgba[3], 1.0 / 255.0, 1.0)
    rgba[3] = math.ceil(rgba[3] * 255.0) / 255.0
    output[i] = rgba

# Save image to png
imageio.imsave(args.file.split('.')[0] + '_rgbm.png', output)

The code works and produces correct results, but it does so very slowly. This is of course caused by iterating over each pixels separately within python, which for larger images can take a long time (about 4:30 minutes for an image with a size of 3200x1600).

My question is: Is there a more efficient way to achieve what I'm after? I briefly looked into vectorization and broadcasting in numpy but haven't found a way to apply those to my problem yet.

Edit:

Thanks to Mateen Ulhaq, i found a solution:

# Convert image
rgb = (1.0 / 7.0) * numpy.sqrt(image)
alpha = numpy.amax(rgb, axis=2)
alpha = numpy.clip(alpha, 1.0 / 255.0, 1.0)
alpha = numpy.ceil(alpha * 255.0) / 255.0
alpha = numpy.reshape(alpha, (height, width, 1))
output = numpy.concatenate((rgb, alpha), axis=2)

This completes in only a few seconds.

Solution

The line

for i in numpy.ndindex(image.shape[:2]):

is just iterating over every pixel. It's probably faster to get rid of the loop and process every pixel in each line of code ("vectorized").

rgb = (1.0 / 7.0) * np.sqrt(image)
alpha = np.amax(rgb, axis=2)
alpha = np.clip(alpha, 1.0 / 255.0, 1.0)
alpha = np.ceil(alpha * 255.0) / 255.0
alpha = numpy.reshape(alpha, (height, width, 1))
output = np.concatenate((rgb, alpha), axis=2)

I think it's also a bit clearer.