Markov Algorithm for Random Writing

I got a litte problem understanding conceptually the structure of a random writing program (that takes input in form of a text file) and uses the Markov algorithm to create a somewhat sensible output.

So the data structure i am using is to use cases ranging from 0-10. Where at case 0: I count the number a letter/symbol or digit appears and base my new text on this to simulate the input. I have already implemented this by using an Map type that holds each unique letter in the input text and a array of how many there are in the text. So I can simply ask for the size of the array for the specific letter and create output text easy like this.

But now I Need to create case1/2/3 and so on... case 1 also holds what letter is most likely to appear after any letter aswell. Do i need to create 10 seperate arrays for these cases, or are there an easier way?

Solution

There are a lot of ways to model this. One approach is as you describe, with an multi-dimensional array where each index is the following character in the chain and the final result is the count.

# Two character sample:
int counts[][] = new int[26][26]
# ... initialize all entries to zero

# 'a' => 0, 'b' => 1, ... 'z' => 25
# For example for the string 'apple'
# Note: I'm only writing this like this to show what the result is, it should be in a
#       loop or function ...
counts['a'-'a']['p'-'a']++
counts['p'-'a']['p'-'a']++
counts['p'-'a']['l'-'a']++
counts['l'-'a']['l'-'e']++

Then to randomly generate names you would count the number of total outcomes for a given character (ex: 2 outcomes for 'p' in the previous example) and pick a weighted random number for one of the possible outcomes.

For smaller sizes (say up to 4 characters) that should work fine. For anything larger you may start to run into memory issues since (assuming you're using A-Z) 26^N entries for an N-length chain.

I wrote something like a couple of years ago. I think I used random pages from Wikipedia to for seed data to generate the weights.