nlp stanford-nlp probability hidden-markov-models markov-models

How to predict probability of a sentence?

How to determine the probability of a sentence "what is a cat" ? with associated PCFG :

Rule , Probability
S -> NP VB
NN -> CAT , 1
DT -> what , 1
VB->is , .5
VB->be , .5

How can this pcfg with sentence be represented as hidden markov model ?

Each node in the model is "what" , "is" , "a" , "cat" ? , how to model the probability connections between the nodes from PCFG ?

Solution

A PCFG defines (i) a distribution over parse trees and (ii) a distribution over sentences.

The probability of a parse tree given by a PCFG is:

where the parse tree t is described as a multiset of rules r (it is a multiset because a rule can be used several times to derive a tree).

To compute the probability of a sentence, you have to consider every possible way of deriving the sentence and sum over their probabilities.

where means that the string of terminals (the yield) of t is the sentence s.

In your example, the probability of "what is a cat" is 0 because you cannot generate it with your grammar. Here's another example with a toy grammar:

Rule            Probability
S -> NP VP      1
NP -> they      0.5
NP -> fish      0.5
VP -> VBP NP    0.5
VP -> MD VB     0.5
VBP -> can      1
MD -> can       1
VB -> fish      1

The sentence "they can fish" has two possible parse trees:

(S (NP they) (VP (MD can) (VB fish)))
(S (NP they) (VP (VBP can) (NP fish)))

with probabilities:

S   NP    VP     MD   VB
1 * 0.5 * 0.5 *  1  * 1 = 1 / 4

and

S   NP    VP     VBP   NP
1 * 0.5 * 0.5 *  1  *  0.5  = 1 /8

so its probability is the sum or probabilities of both parse trees (3/8).

It turns out that in the general case, it is too computationally expensive to enumerate each possible parse tree for a sentence. However, there is a O(n^3) algorithm to compute the sum efficiently: the inside-outside algorithm (see pages 17-18), pdf by Michael Collins.

Edit: corrected trees.