Search code examples
javascripttextnlphidden-markov-models

How to succesfully implement a markov model for generating the next word of a sentence?


I am working on javascript program that takes text and use it to generate sentences that seems to make sense at first glance.

I'm implementing a markov model.

I have for example :

[{word:"hello", prob: 0.5}, {word: "world", prob: 0.25},...]

My model is much more complex and I'm not going to explain every detail.

What I want to know is when knowing the probability of a certain word occurring, how can one create the sentence generator in Javascript?

What I currently have seems to be doing that but when really thinking about it it's just random. What I've tried was to compare the prob value of each word in my table with a randomly selected value from 0 to 1.

I would have for example picked

 randomValue = Math.Random().toFixed(2)

using toFixed to have values that are 0.33 instead of 0.3455343.... And I would then compare it with the different prob value for every word and see if it matches. Once it matches I pick that word.

What is the correct way of at least getting words to be picked by probability rather than what I did which seems to just be random selection.


Solution

  • I am not overly familiar with the markov model, but I feel like I could lend a hand here- especially considering that there are no answers here so far.

    First, the code you provided:

    randomValue = Math.Random().toFixed(2)
    

    has a couple of issues. The "R" in random should be lowercase, and toFixed(2) returns a string, not a number. The correct version of that line is:

    var randomValue = Number(Math.random().toFixed(2));
    

    That being said, to pick the next word based purely on the highest probability, you wouldn't need to use that line of code anyway. You'd do something line this:

    var nextWordProbabilities = [{word:"hello", prob: 0.5}, {word: "world", prob: 0.25}];
    
    nextWordProbabilities.sort(function(a, b){
      if(a.prob < b.prob)return 1;
      if(a.prob > b.prob)return -1;
      return 0;
    });
    var nextWord = nextWordProbabilities[0].word;
    

    If you then wanted to throw in a little randomness so you didn't always end up with exactly the highest probability word, but rather possibly a word that was just close enough to the highest possibility, you could go on to then add this following that previous code block:

    var TENDENCY_TOWARDS_MOST_PROBABLE_WORDS = .5;
    for(var i = 0; i < nextWordProbabilities.length; i++){
        if(Math.random() > TENDENCY_TOWARDS_MOST_PROBABLE_WORDS){
            nextWord = nextWordProbabilities[i].word;
        }
    }
    

    I'm also not sure how you're determining when to end a sentence. If you're not just doing a set number of words in a row, it might be a good idea to just end the sentence when the most probable word isn't a super probable, like so:

    if(nextWordProbabilities[0].prob < .2){
        //end the sentence
    }
    

    Hope this is helpful.