Search code examples
c#naivebayeshidden-markov-modelsbayesian-networks

How to create and use a HMM Dynamic Bayesian Network in Bayes Server?


I'm trying to build a prediction module implementing a Hidden Markov Model type DBN in Bayes Server 7 C#. I managed to create the network structure but I'm not sure if its correct because their documentation and examples are not very comprehensive and I also don't fully understand how the prediction is meant to be done in the code after training is complete.

Here is a how my Network creation and training code looks:

    var Feature1 = new Variable("Feature1", VariableValueType.Continuous);
    var Feature2 = new Variable("Feature2", VariableValueType.Continuous);
    var Feature3 = new Variable("Feature3", VariableValueType.Continuous);

    var nodeFeatures = new Node("Features", new Variable[] { Feature1, Feature2, Feature3 });
    nodeFeatures.TemporalType = TemporalType.Temporal;

    var nodeHypothesis = new Node(new Variable("Hypothesis", new string[] { "state1", "state2", "state3" }));
    nodeHypothesis.TemporalType = TemporalType.Temporal;

    // create network and add nodes
    var network = new Network();
    network.Nodes.Add(nodeHypothesis);
    network.Nodes.Add(nodeFeatures);

    // link the Hypothesis node to the Features node within each time slice
    network.Links.Add(new Link(nodeHypothesis, nodeFeatures));

    // add a temporal link of order 5.  This links the Hypothesis node to itself in the next time slice
    for (int order = 1; order <= 5; order++)
    {
        network.Links.Add(new Link(nodeHypothesis, nodeHypothesis, order));
    }

    var temporalDataReaderCommand = new DataTableDataReaderCommand(evidenceDataTable);
        var temporalReaderOptions = new TemporalReaderOptions("CaseId", "Index", TimeValueType.Value);

    // here we map variables to database columns
    // in this case the variables and database columns have the same name
    var temporalVariableReferences = new VariableReference[]
        {
            new VariableReference(Feature1, ColumnValueType.Value, Feature1.Name),
            new VariableReference(Feature2, ColumnValueType.Value, Feature2.Name),
            new VariableReference(Feature3, ColumnValueType.Value, Feature3.Name)
        };

    var evidenceReaderCommand = new EvidenceReaderCommand(
            temporalDataReaderCommand,
            temporalVariableReferences,
            temporalReaderOptions);

    // We will use the RelevanceTree algorithm here, as it is optimized for parameter learning
    var learning = new ParameterLearning(network, new RelevanceTreeInferenceFactory());
    var learningOptions = new ParameterLearningOptions();

    // Run the learning algorithm
    var result = learning.Learn(evidenceReaderCommand, learningOptions);

And this is my attempt at prediction:

    // we will now perform some queries on the network
    var inference = new RelevanceTreeInference(network);
    var queryOptions = new RelevanceTreeQueryOptions();
    var queryOutput = new RelevanceTreeQueryOutput();

    int time = 0;

    // query a probability variable 
    var queryHypothesis = new Table(nodeHypothesis, time);
    inference.QueryDistributions.Add(queryHypothesis);                

    double[] inputRow = GetInput();

    // set some temporal evidence
    inference.Evidence.Set(Feature1, inputRow[0], time);
    inference.Evidence.Set(Feature2, inputRow[1], time);
    inference.Evidence.Set(Feature3, inputRow[2], time);

    inference.Query(queryOptions, queryOutput);

    int hypothesizedClassId;
    var probability = queryHypothesis.GetMaxValue(out hypothesizedClassId);
    Console.WriteLine("hypothesizedClassId = {0}, score = {1}", hypothesizedClassId, probability);

Here I'm not even sure how to "Unroll" the network properly to get the prediction and what value to assign to the variable "time". If someone can shed some light on how this toolkit works, I would greatly appreciate it. Thanks.


Solution

  • The code looks fine except for the network structure, which should look something like this for an HMM (the only change to your code is the links):

    var Feature1 = new Variable("Feature1", VariableValueType.Continuous);
            var Feature2 = new Variable("Feature2", VariableValueType.Continuous);
            var Feature3 = new Variable("Feature3", VariableValueType.Continuous);
    
            var nodeFeatures = new Node("Features", new Variable[] { Feature1, Feature2, Feature3 });
            nodeFeatures.TemporalType = TemporalType.Temporal;
    
            var nodeHypothesis = new Node(new Variable("Hypothesis", new string[] { "state1", "state2", "state3" }));
            nodeHypothesis.TemporalType = TemporalType.Temporal;
    
            // create network and add nodes
            var network = new Network();
            network.Nodes.Add(nodeHypothesis);
            network.Nodes.Add(nodeFeatures);
    
            // link the Hypothesis node to the Features node within each time slice
            network.Links.Add(new Link(nodeHypothesis, nodeFeatures));
    
            // An HMM also has an order 1 link on the latent node
            network.Links.Add(new Link(nodeHypothesis, nodeHypothesis, 1));
    

    It is also worth noting the following:

    1. You can add multiple distributions to 'inference.QueryDistributions' and query them all at once
    2. While it is perfectly valid to set evidence manually and then query, see EvidenceReader, DataReader and either DatabaseDataReader or DataTableDataReader, if you want to execute the query over multiple records.
    3. Check out the TimeSeriesMode on ParameterLearningOptions
    4. If you want the 'Most probable explanation' set queryOptions.Propagation = PropagationMethod.Max; // an extension of the Viterbi algorithm for HMMs