Search code examples
rfunctional-programmingpurrrbnlearn

How to change probability table in object of class bn.fit (bn.fit.dnet) from bnlearn library?


I try to modify object of class bn.fit (bn.fit.dnet) from R's bnlearn library. I need

  1. to set equal probabilities for every row in bn.fit$node$prob table. For this I use next code:
    library(bnlearn)
    library(purrr)
    
    data(insurance)
    
    bn <- tabu(insurance, score = "bic")
    bn_fit <- bn.fit(bn, insurance, method = 'bayes')
    
    bn_fit[1:length(bn_fit)] <- modify(bn_fit[1:length(bn_fit)], function(node) {
      node$prob <- modify(node$prob, ~(1 / NROW(node$prob)))
      node
    })
    

I suppose this approach bit ugly and almost sure there exist more elegant way to do this. I can not remove 1:length(bn_fit). Also I don't know why I can not use NROW(.x) instead of NROW(node$prob) in my code.

  1. To set arbitrary distribution on EVERY column in bn.fit$node$prob table. I don't understand how to avoid for loops in this case.

Related question is here


Solution

  • Regarding (1), modify takes a list or an atomic vector. bn_fit is of class bn.fit, bn.fit.dnet, however, under the hood it is a list too, since calling typeof() yields list. My guess is that there is no S3 generic method for subsetting these classes so R figures out that it is essentially a list and accordingly strips the class arguments. So subsetting bn_fit turns it into class list, and therefore you can use modify on it. Subsetting can even be done with empty brackets [], it will just return the object, but this time as class list. An alternative that I use below is to "manually" set the class attribute to NULL via attr(bnfit, "class") <- NULL.

    Regarding (2), I wrote a tidyverse based function which can be used to alter the prob table of each node into a bayesm::rdirichlet distribution (see code below). The user still needs to provide part of the alpha argument (the length argument is given by the length of each prob table). Under the hood the function is relying on purrr::modify. It takes care of the classes by stripping them first and adding them back once the modification is done. My approach is to turn the prob tables into data.frames then modify the Freq column and adjust it for existing other variables (groups) and then translate that data.frame back into a table using xtabs and formulation notation via reformulate.

    I'm not so deep into bayesian networks, so I do not know to what extent this function can be generalized, or whether it only works on the dataset you provided. Further, please test if the modified object is accepted by functions expecting class bn.fit, bn.fit.dnet.

    I tried to comment each step of my code, but please ask if something is not clear.

    (3) Regarding your question why NROW(.x) does not work in your code and you have to use NROW(node$prob) instead: This has to do with the way modify loops over the prob tables. A good way to check over what elements modify is looping over is to use purrr::pluck.

    library(bnlearn)
    library(tidyverse)
    
    data(insurance)
    
    bn <- tabu(insurance, score = "bic")
    bn_fit <- bn.fit(bn, insurance, method = 'bayes')
    
    change_bn_prob_table <- function(bnfit, alpha) {
      
      # save class attribute of bnfit object
      old_class <- attr(bnfit, "class")
      
      # strip class so that `modify` can be used
      attr(bnfit, "class") <- NULL
      
      # loop over `prop` tables of each node
      new <- purrr::modify(bnfit, function(x) {
        
        # save attributes of x 
        old_x_attr <- attributes(x)
        
        # save attributes of x[["prob"]]
        old_xprob_attr <- attributes(x[["prob"]])
        
        # turn `table` into data.frame
        inp <- as.data.frame(x[["prob"]]) 
        # save names apart from `Freq`
        cnames <- inp %>% select(-Freq) %>% colnames
        
        out <- inp %>% 
          # overwrite column `Freq` with probabilities from bayesm::rdirichlet
          # alpha needs to be supplied (the length of alpha is given by `nrow`)
          mutate(Freq := bayesm::rdirichlet(c(rep(alpha, nrow(inp))))) %>% 
          # devide probilities by sum of Freq in all remaining groups
          group_by(!!! syms(cnames[-1])) %>% 
          mutate(Freq := Freq/sum(Freq)) %>% 
          # turn data.frame back into prob table using formula notation via reformulate
          xtabs(reformulate(paste(colnames(.)), "Freq"), .)
        
        # strip `call` attribute from newly generated prob table
        attr(out, "call") <- NULL  
        # add `class` `table` as attribute
        attr(out, "class") <- "table"
        
        # restore old attribues and write x out to x$prob
        attributes(out) <- old_xprob_attr
        x[["prob"]] <- out
        
        # restore old attribues and return x
        attributes(x) <- old_x_attr
        x
        
      })
      
      # add saved class attributes 
      attr(new, "class") <- old_class
      new
      
    }
    # here `2` is the first part of `alpha` of `bayesm::rdirichlet`
    bn_fit2 <- change_bn_prob_table(bn_fit, 2)
    
    # test that `logLik` can be used on new modified bnfit object 
    logLik(bn_fit2, insurance)
    #> [1] -717691.8
    

    Created on 2020-06-21 by the reprex package (v0.3.0)