image input machine-learning conv-neural-network hdr

Multiple inputs to a convolutional neural network with one output?

Is there a standard way to input multiple images into a CNN and condense the information down into a singular image in the end? I'm working on an HDR dataset where you have multiple images of the same scene and put them together to form an image with less noise.

What I have tried to do is set the separate images as channels, but I'm unsure if this is appropriate as the outputs were weird.

Solution

Is this implementable?

Yes! Any neural network, regardless of number or type of layers, is nothing more than some function f: I → O, so as long as your different inputs belong to the same type you can freely pass as many of them through your network as you please, getting equally many outputs on the other side of the black box. These can be in turn used in any way you desire through any other function g: O x O → O' (could be addition, tensor multiplication, concatenation or something fancier like another neural network).

Does this make sense?

No answer here; this largely depends on what you are expecting your function to be doing, what your data are and what your end-goal is. Are you assuming that all of your inputs obey the same statistical properties or have similar underlying patterns? In that case, you could argue that the same function can model them. Keep in mind that by applying them in parallel (i.e. asynchronously) you're keeping your function blind to potential cross-dependencies between the two inputs, which in some cases makes complete sense. To make this a bit clearer, if your different inputs are the different channels of a single image, I think this is the wrong approach; each channel may convey different information, and keeping your network unaware of how each channel affects each other, while forcing it to create meaningful abstractions using the same function over all of them doesn't sound like a great idea. On the other hand, if your different images are e.g. photos of an object from different angles and the sub-network you're applying them to is some sort of classifier over their features (already obtained through another CNN, for instance), then it could make sense to model both using the same function.