Search code examples
javastringcomparestring-comparisonsentence

Skip Bi-grams in Java String (Compare two sentences)


I need help to do this exact thing with a String in Java. The best way to explain for me is by using a example.

So, I want to extract skip bi-grams from two sentences (user's input) and then be able to compare each others in terms of resemblance.

Sentence #1 : "I love green apples." Sentence #2 : "I love red apples."

Also, there is a variable named "distance" that is used to get the distance between words. (It is not very important at the moment)

Results

The skip bi-grams extracted from Sentence #1 using a distance of 3 would be :

{I love}, {I green}, {I apples}, {love green}, {love apples}, {green apples}

(Total of 6 bi-grams)

The skip bi-grams extracted from Sentence #2 using a distance of 3 would be :

{I love}, {I red}, {I apples}, {love red}, {love apples}, {red apples}

(Total of 6 bi-grams)


So far I have thought using String[] to put split String sentences.

So my question is, what could be the code that would extract those bi-grams from sentences ?

Thanks in advance!


Solution

  • Basically, you want to find all unique two word combinations from a sentence of words.

    Here is one solution involving ArrayList:

    import java.util.ArrayList;
    import java.util.Arrays;
    import java.util.List;
    
    public class Test {
        public static String[][] skipBigrams(String input) {
            String[] tokens = input.replaceAll("[^a-zA-Z ]", "").split("\\s+");
            return skipBigrams(tokens);
        }
    
        private static String[][] skipBigrams(String[] tokens) {
            List<String[]> bigrams = new ArrayList<>();
            for (int i = 0; i < tokens.length; i++) {
                for (int j = i + 1; j < tokens.length; j++) {
                    bigrams.add(new String[]{tokens[i], tokens[j]});
                }
            }
            String[][] result = new String[bigrams.size()][2];
            result = bigrams.toArray(result);
            return result;
        }
    
        public static void main(String[] args) {
            String s1 = "I love green apples.";
            System.out.println(Arrays.deepToString(skipBigrams(s1)));
        }
    }