Search code examples
rfor-loopbnlearn

Using cpquery function for several pairs from dataset


I am relatively beginner in R and trying to figure out how to use cpquery function for bnlearn package for all edges of DAG.

First of all, I created a bn object, a network of bn and a table with all strengths.

library(bnlearn)
data(learning.test)
baynet = hc(learning.test)
fit = bn.fit(baynet, learning.test)
sttbl = arc.strength(x = baynet, data = learning.test)

Then I tried to create a new variable in sttbl dataset, which was the result of cpquery function.

sttbl = sttbl %>% mutate(prob = NA) %>% arrange(strength)
sttbl[1,4] = cpquery(fit, `A` == 1, `D` == 1)

It looks pretty good (especially on bigger data), but when I am trying to automate this process somehow, I am struggling with errors, such as:

Error in sampling(fitted = fitted, event = event, evidence = evidence, : logical vector for evidence is of length 1 instead of 10000.

In perfect situation, I need to create a function that fills the prob generated variable of sttbl dataset regardless it's size. I tried to do it with for loop to, but stumbled over the error above again and again. Unfortunately, I am deleting failed attempts, but they were smt like this:

for (i in 1:nrow(sttbl)) {
     j = sttbl[i,1]
     k = sttbl[i,2]
     sttbl[i,4]=cpquery(fit, fit$j %in% sttbl[i,1]==1, fit$k %in% sttbl[i,2]==1)
}

or this:

for (i in 1:nrow(sttbl)) {
     sttbl[i,4]=cpquery(fit, sttbl[i,1] == 1, sttbl[i,2] == 1)
}

Now I think I misunderstood something in R or bnlearn package.

Could you please tell me how to realize this task with filling the column by multiple cpqueries? That would help me a lot with my research!


Solution

  • cpquery is quite difficult to work with programmatically. If you look at the examples in the help page you can see the author uses eval(parse(...)) to build the queries. I have added two approaches below, one using the methods from the help page and one using cpdist to draw samples and reweighting to get the probabilities.

    Your example

    library(bnlearn); library(dplyr)
    data(learning.test)
    baynet = hc(learning.test)
    fit = bn.fit(baynet, learning.test)
    sttbl = arc.strength(x = baynet, data = learning.test)
    sttbl = sttbl %>% mutate(prob = NA) %>% arrange(strength)
    

    This uses cpquery and the much maligned eval(parse(...)) -- this is the approach the the bnlearn author takes to do this programmatically in the ?cpquery examples. Anyway,

    # You want the evidence and event to be the same; in your question it is `1`
    # but for example using learning.test data we use 'a'
    state = "\'a\'" # note if the states are character then these need to be quoted
    event = paste(sttbl$from, "==", state)
    evidence = paste(sttbl$to, "==", state)
    
    # loop through using code similar to that found in `cpquery`
    set.seed(1) # to make sampling reproducible
    for(i in 1:nrow(sttbl)) {
      qtxt = paste("cpquery(fit, ", event[i], ", ", evidence[i], ",n=1e6", ")")
      sttbl$prob[i] = eval(parse(text=qtxt))
    }
    

    I find it preferable to work with cpdist which is used to generate random samples conditional on some evidence. You can then use these samples to build up queries. If you use likelihood weighting (method="lw") it is slightly easier to do this programatically (and without evil(parse(...))). The evidence is added in a named list i.e. list(A='a').

    # The following just gives a quick way to assign the same
    # evidence state to all the evidence nodes.  
    evidence = setNames(replicate(nrow(sttbl), "a", simplify = FALSE), sttbl$to)
    
    # Now loop though the queries
    # As we are using likelihood weighting we need to reweight to get the probabilities
    # (cpquery does this under the hood)
    # Also note with this method that you could simulate from more than
    # one variable (event) at a time if the evidence was the same.
    for(i in 1:nrow(sttbl)) {
      temp = cpdist(fit, sttbl$from[i], evidence[i], method="lw")
      w = attr(temp, "weights")  
      sttbl$prob2[i] = sum(w[temp=='a'])/ sum(w)
    }
    
    sttbl
    #   from to   strength      prob     prob2
    # 1    A  D -1938.9499 0.6186238 0.6233387
    # 2    A  B -1153.8796 0.6050552 0.6133448
    # 3    C  D  -823.7605 0.7027782 0.7067417
    # 4    B  E  -720.8266 0.7332107 0.7328657
    # 5    F  E  -549.2300 0.5850828 0.5895373