I am wondering if there exists any case or needs for having multiple type of neurons which have different activation functions to each other, mixed within a single layer, and if so, how to implement that using tensorflow Estimator framework.
I can think of simple example for which such a configuration might become useful.
Think about trying to train a neural network that can predict whether an any given 2D point which has its coordinate value as (x, y), lying inside or outside of the given circle which also has its center and radius defined in a same 2D simple space.
Lets say that our circle has its center at (0.5, 0.5), and its radius defined to be 0.5.
The strategy for our training might be something like: to generate many points randomly first, and to judge for each point lies whether inside or outside of the circle, so that we can feed those set of randomly generated coordinates as the features and the result of the inside/outside judgement for each of it as its corresponding labels.
The judgement could easily be done by verifying formula below:
(x-0.5)^2 + (y-0.5)^2 < r^2
and this could be transformed as below:
x^2 - x +y^2 - y + 0.5 < r^2
Now, looking at this last formula, the training might obviously become effective if the neural network itself could obtain the value such as x^2 and y^2 automatically, simply out of its feature value which is given as (x, y).
For this, I have come to an idea to mix such an neurons that has f(x)=x^2 as its activation function, amongst the standard ReLU neurons.
To be honest, I already have done several test implementations of this problem using tensorflow Estimator framework, and in one of those I have seen that giving x^2 and y^2 as the additional features (total 4 feature values) should contribute to the effective training convergence compared to the case of 2 features, but the solution using f(x)=x^2 activation function appeard much smarter to me.
And thats how I came up with my question here.
Hope if I could hear any opinion for this.
Thank you.
Feature engineering (giving x^2 as input in addition to x) is still a very large part solving ML problems in many domains. I have never heart of people doing feature engineering by applying different activations to intermediary layers. Usually it is always done as part of input pre-processing.
If you want to experiment with it. I believe there is no special support for having multiple activation functions in a layer in TensorFlow. However, you should be able to achieve it yourself fairly easily.
Here is one example to apply different activation functions to each slice of the tensor along the first dimension (very slow for tensors with large first dimension) in a round-robin fashion. You can probably do some smarter slicing.
def make_activator(activations):
def activator(t):
slices = tf.unstack(t)
activated = []
for s, act in zip(slices, itertools.cycle(activations)):
activated.append(act(s))
return tf.stack(activated)
return activator
You can then use it like this in your layers:
tf.layers.dense(..., activation=make_activator([tf.nn.relu, tf.square]))
You can also just add "parallel layers", each with a different activation, and then merge (e.g. sum) or concatenate their outputs before giving it to the next layer.