Search code examples
javaanalysisn-gramtrigram

Getting 'trigrams' in Java


I am having a bit of an issue getting trigrams in Java. My program can currently get bigrams fine but when I try to implement the same structure of the method and change it to get trigrams it seems to not work as well. I want the trigrams to get every possible combination of words within the arraylist, e.g.

Original = [eye, test, find, free, nhs]
Trigram = [eye test find, 2, eye test free, 3, eye test nhs, 4, eye find free, 3, eye find nhs, 4, eye free nhs, 5, etc...]

The numbers determine the distance between the first word and the last word and should get every combination of words of a 3 in the arraylist. This currently works fine for bigrams...

Original = [eye, test, find, free, nhs]
Bigram = [eye test, 1, eye find, 2, eye free, 3, eye nhs, 4, test find, 1, test free, 2, test nhs, 3, find free, 1, etc..]

Here are the methods

public ArrayList<String> bagOfWords;
public ArrayList<String> bigramList = new ArrayList<String>();
public ArrayList<String> trigramList = new ArrayList<String>();


public void trigram() throws FileNotFoundException{
    PrintWriter tg = new PrintWriter(new File(trigramFile));
    // CREATES THE TRIGRAM
    for (int i = 0; i < bagOfWords.size() - 1; i++) {
        for (int j = 1; j < bagOfWords.size() - 1; j++) {
            for(int k = j + 1; k < bagOfWords.size(); k++){
                int distance = (k - i);
                if (distance < 4){
                    trigramList.add(bagOfWords.get(i) + " " + bagOfWords.get(j) + " " + bagOfWords.get(k) + ", " + distance);
                }
            }
        }
    }


public void bigram() throws FileNotFoundException{
    // CREATES THE BIGRAM
    PrintWriter bg = new PrintWriter(new File(bigramFile));
    for (int i = 0; i < bagOfWords.size() - 1; i++) {
        for (int j = i + 1; j < bagOfWords.size(); j++) {
            int distance = (j - i);
            if (distance < 4){
                bigramList.add(bagOfWords.get(i) + " " + bagOfWords.get(j) + ", " + distance);
            }
        }
    }

Can anyone help me alter the trigram() method to create an appropriate trigram for what I need? Thanks for any help.


Solution

  • You want j to start at i+1, don't you? Also, I think you are letting i count to far. It should stop at bagOfWords.size() - 2. I am not sure why you check distance < 4. This will throw out valid groups.

    public void trigram() throws FileNotFoundException{
    PrintWriter tg = new PrintWriter(new File(trigramFile));
    // CREATES THE TRIGRAM
    for (int i = 0; i < bagOfWords.size() - 2; i++) {
        for (int j = i + 1; j < bagOfWords.size() - 1; j++) {
            for(int k = j + 1; k < bagOfWords.size(); k++){
                int distance = (k - i);
                trigramList.add(bagOfWords.get(i) + " " + bagOfWords.get(j) + " " + bagOfWords.get(k) + ", " + distance);
            }
        }
    }