I need help to do this exact thing with a String in Java. The best way to explain for me is by using a example.
So, I want to extract skip bi-grams from two sentences (user's input) and then be able to compare each others in terms of resemblance.
Sentence #1 : "I love green apples." Sentence #2 : "I love red apples."
Also, there is a variable named "distance" that is used to get the distance between words. (It is not very important at the moment)
Results
The skip bi-grams extracted from Sentence #1 using a distance of 3 would be :
{I love}, {I green}, {I apples}, {love green}, {love apples}, {green apples}
(Total of 6 bi-grams)
The skip bi-grams extracted from Sentence #2 using a distance of 3 would be :
{I love}, {I red}, {I apples}, {love red}, {love apples}, {red apples}
(Total of 6 bi-grams)
So far I have thought using String[] to put split String sentences.
So my question is, what could be the code that would extract those bi-grams from sentences ?
Thanks in advance!
Basically, you want to find all unique two word combinations from a sentence of words.
Here is one solution involving ArrayList:
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
public class Test {
public static String[][] skipBigrams(String input) {
String[] tokens = input.replaceAll("[^a-zA-Z ]", "").split("\\s+");
return skipBigrams(tokens);
}
private static String[][] skipBigrams(String[] tokens) {
List<String[]> bigrams = new ArrayList<>();
for (int i = 0; i < tokens.length; i++) {
for (int j = i + 1; j < tokens.length; j++) {
bigrams.add(new String[]{tokens[i], tokens[j]});
}
}
String[][] result = new String[bigrams.size()][2];
result = bigrams.toArray(result);
return result;
}
public static void main(String[] args) {
String s1 = "I love green apples.";
System.out.println(Arrays.deepToString(skipBigrams(s1)));
}
}