Search code examples
neural-networkkerastheanopymcpymc3

Fitting a sine wave with Keras and PYMC3 yields unexpected results


I've been trying to fit a sine curve with a keras (theano backend) model using pymc3. I've been using this [http://twiecki.github.io/blog/2016/07/05/bayesian-deep-learning/] as a reference point.

A Keras implementation alone fit using optimization does a good job, however Hamiltonian Monte Carlo and Variational sampling from pymc3 is not fitting the data. The trace is stuck at where the prior is initiated. When I move the prior the posterior moves to the same spot. The posterior predictive of the bayesian model in cell 59 is barely getting the sine wave, whereas the non-bayesian fit model gets it near perfect in cell 63. I created a notebook here: https://gist.github.com/tomc4yt/d2fb694247984b1f8e89cfd80aff8706 which shows the code and the results.

Here is a snippet of the model below...

class GaussWeights(object):
    def __init__(self):
        self.count = 0

    def __call__(self, shape, name='w'):
        return pm.Normal(
            name, mu=0, sd=.1,
            testval=np.random.normal(size=shape).astype(np.float32),
            shape=shape)


def build_ann(x, y, init):
    with pm.Model() as m:

        i = Input(tensor=x, shape=x.get_value().shape[1:])
        m = i
        m = Dense(4, init=init, activation='tanh')(m)
        m = Dense(1, init=init, activation='tanh')(m)

        sigma = pm.Normal('sigma', 0, 1, transform=None)
        out = pm.Normal('out', 
                         m, 1,
                         observed=y, transform=None)

    return out



 with pm.Model() as neural_network:
    likelihood = build_ann(input_var, target_var, GaussWeights())

#     v_params = pm.variational.advi(
#         n=300, learning_rate=.4
#     )
#     trace = pm.variational.sample_vp(v_params, draws=2000)
    start = pm.find_MAP(fmin=scipy.optimize.fmin_powell)
    step = pm.HamiltonianMC(scaling=start)
    trace = pm.sample(1000, step,   progressbar=True)

Solution

  • The model contains normal noise with a fixed std of 1:

    out = pm.Normal('out', m, 1, observed=y)
    

    but the dataset does not. It is only natural that the predictive posterior does not match the dataset, they were generated in a very different way. To make it more realistic you could add noise to your dataset, and then estimate sigma:

    mu = pm.Deterministic('mu', m)
    sigma = pm.HalfCauchy('sigma', beta=1)
    pm.Normal('y', mu=mu, sd=sigma, observed=y)
    

    What you are doing right now is similar to taking the output from the network and adding standard normal noise.

    A couple of unrelated comments:

    • out is not the likelihood, it is just the dataset again.
    • If you use HamiltonianMC instead of NUTS, you need to set the step size and the integration time yourself. The defaults are not usually useful.
    • Seems like keras changed in 2.0 and this way of combining pymc3 and keras does not seem to work anymore.