Fitting a sine wave with Keras and PYMC3 yields unexpected results

I've been trying to fit a sine curve with a keras (theano backend) model using pymc3. I've been using this [http://twiecki.github.io/blog/2016/07/05/bayesian-deep-learning/] as a reference point.

A Keras implementation alone fit using optimization does a good job, however Hamiltonian Monte Carlo and Variational sampling from pymc3 is not fitting the data. The trace is stuck at where the prior is initiated. When I move the prior the posterior moves to the same spot. The posterior predictive of the bayesian model in cell 59 is barely getting the sine wave, whereas the non-bayesian fit model gets it near perfect in cell 63. I created a notebook here: https://gist.github.com/tomc4yt/d2fb694247984b1f8e89cfd80aff8706 which shows the code and the results.

Here is a snippet of the model below...

class GaussWeights(object):
    def __init__(self):
        self.count = 0

    def __call__(self, shape, name='w'):
        return pm.Normal(
            name, mu=0, sd=.1,
            testval=np.random.normal(size=shape).astype(np.float32),
            shape=shape)


def build_ann(x, y, init):
    with pm.Model() as m:

        i = Input(tensor=x, shape=x.get_value().shape[1:])
        m = i
        m = Dense(4, init=init, activation='tanh')(m)
        m = Dense(1, init=init, activation='tanh')(m)

        sigma = pm.Normal('sigma', 0, 1, transform=None)
        out = pm.Normal('out', 
                         m, 1,
                         observed=y, transform=None)

    return out



 with pm.Model() as neural_network:
    likelihood = build_ann(input_var, target_var, GaussWeights())

#     v_params = pm.variational.advi(
#         n=300, learning_rate=.4
#     )
#     trace = pm.variational.sample_vp(v_params, draws=2000)
    start = pm.find_MAP(fmin=scipy.optimize.fmin_powell)
    step = pm.HamiltonianMC(scaling=start)
    trace = pm.sample(1000, step,   progressbar=True)

Solution

The model contains normal noise with a fixed std of 1:

out = pm.Normal('out', m, 1, observed=y)

but the dataset does not. It is only natural that the predictive posterior does not match the dataset, they were generated in a very different way. To make it more realistic you could add noise to your dataset, and then estimate sigma:

mu = pm.Deterministic('mu', m)
sigma = pm.HalfCauchy('sigma', beta=1)
pm.Normal('y', mu=mu, sd=sigma, observed=y)

What you are doing right now is similar to taking the output from the network and adding standard normal noise.

A couple of unrelated comments:

out is not the likelihood, it is just the dataset again.
If you use HamiltonianMC instead of NUTS, you need to set the step size and the integration time yourself. The defaults are not usually useful.
Seems like keras changed in 2.0 and this way of combining pymc3 and keras does not seem to work anymore.