Search code examples
rbnlearnr-graphviz

How do I graph a Bayesian Network with instantiated nodes using bnlearn and graphviz?


I am trying to graph a Bayesian Network (BN) with instantiated nodes using the libraries bnlearn and Rgraphviz. My workflow is as follow:

After creating a data frame with random data (the data I am actually using is obviously not random) I then discretise the data, structure learn the directed acyclic graph (DAG), fit the data to the DAG and then plot the DAG. I also plot a DAG which shows the posterior probabilities of each of the nodes.

#rm(list = ls())
library(bnlearn)
library(Rgraphviz)

# Generating random dataframe
data_clean <- data.frame(a = runif(min = 0, max = 100, n = 1000),
                         b = runif(min = 0, max = 100, n = 1000),
                         c = runif(min = 0, max = 100, n = 1000),
                         d = runif(min = 0, max = 100, n = 1000),
                         e = runif(min = 0, max = 100, n = 1000))

# Discretising the data into 3 bins
bins <- 3
data_discrete <- discretize(data_clean, breaks = bins)

# Creating factors for each bin in the data
lv <- c("low", "med", "high")

for (i in names(data_discrete)){
  levels(data_discrete[, i]) = lv
}

# Structure learning the DAG from the training set
whitelist <- matrix(c("a", "b",
                      "b", "c",
                      "c", "e",
                      "a", "d",
                      "d", "e"),
                    ncol = 2, byrow = TRUE, dimnames = list(NULL, c("from", "to")))

bn.hc <- hc(data_discrete, whitelist = whitelist)

# Plotting the DAG
dag.hc <- graphviz.plot(bn.hc,
                        layout = "dot")

# Fitting the data to the structure
fitted <- bn.fit(bn.hc, data = data_discrete, method = "bayes")

# Plotting the DAG with posteriors
graphviz.chart(fitted, type = "barprob", layout = "dot")

The next thing I do is to manually change the distributions in the bn.fit object, assigned to fitted, and then plot a DAG that shows the instantiated nodes and the updated posterior probability of the response variable e.

# Manually instantiating
fitted_evidence <- fitted

cpt.a = matrix(c(1, 0, 0), ncol = 3, dimnames = list(NULL, lv))

cpt.c = c(1, 0, 0,
          0, 1, 0,
          0, 0, 1)
dim(cpt.c) <- c(3, 3)
dimnames(cpt.c) <-  list("c" = lv, "b" =  lv)

cpt.b = c(1, 0, 0,
          0, 1, 0,
          0, 0, 1)
dim(cpt.b) <- c(3, 3)
dimnames(cpt.b) <-  list("b" = lv, "a" =  lv)

cpt.d = c(0, 0, 1,
          0, 1, 0,
          1, 0, 0)
dim(cpt.d) <- c(3, 3)
dimnames(cpt.d) <-  list("d" = lv, "a" =  lv)

fitted_evidence$a <- cpt.a
fitted_evidence$b <- cpt.b
fitted_evidence$c <- cpt.c
fitted_evidence$d <- cpt.d

# Plotting the DAG with instantiation and posterior for response
graphviz.chart(fitted_evidence, type = "barprob", layout = "dot")

This is the result I get but my actual BN is much larger with many more arcs and it would be impractical to manually change the bn.fit object.

enter image description here

I would like to find out if there is a way to plot a DAG with instantiation without changing the bn.fit object manually? Is there a workaround or function that I am missing?

I think/hope I have read the documentation for bnlearn thoroughly. I appreciate any feedback and would be happy to change anything in the question if I have not conveyed my thoughts clearly enough.

Thank you.


Solution

  • How about using cpdist to draw samples from the posterior given the evidence. You can then estimate the updated parameters using bn.fit using the cpdist samples. Then plot as before.

    An example:

    set.seed(69184390) # for sampling
    
    # Your evidence vector
    ev <- list(a = "low", b="low", c="low", d="high")
    
    # draw samples
    updated_dat <- cpdist(fitted, nodes=bnlearn::nodes(fitted), evidence=ev, method="lw", n=1e6)
    
    # refit : you'll get warnings over missing levels
    updated_fit <- bn.fit(bn.hc, data = updated_dat)
    
    # plot
    par(mar=rep(0,4))
    graphviz.chart(updated_fit, type = "barprob", layout = "dot")
    

    Note I used bnlearn::nodes as nodes is masked by a dependency of Rgraphviz. I tend to load bnlearn last.