I have several sentences in 2 documents who compare each other . I use formula similarity for comparing them and i use List<List<>>
to get element sentences from documents. But it only works for 2 documents and doesn't work if i compare it more than three for example i compare 5 documents or more.
The problem is how i get many sentence in several documents to compare them .
Here is my code.
List<List<Sentence>> collect = Arrays.asList(new File(p).listFiles()).stream()
.map((x) -> configSentenceByLine(x.getAbsolutePath()))
.map((x) -> tokenizingWord(x))
.map((x) -> stemmingWord(x))
.map((x) -> countWordBased(x))
.collect(Collectors.toList());
for (int i = 0; i < collect.get(0).size(); i++) {
int mr = 1;
for (int j = 0; j < collect.get(1).size(); j++) {
double sim = nc.getSimilarity(collect.get(0).get(i).getSentence(), collect.get(0+1).get(j+1).getSentence());
System.out.println("Similarity = " + sim);
mr++;
}
}
Sorry for my bad English
I suppose you need to compute the similarity for all lines between all N documents. If so, you have to compare every possible pair of documents. The total number of document-pairs is the combination of n documents taken 2 at a time without repetition; thus, for 5 documents there are 10 possible pairs:
The actual pairs are: 1-2, 1-3, 1-4, 1-5, 2-3, 2-4, 2-5, 3-4, 3-5, 4-5
As you may notice, you initially compare the 1st document with the rest 4, then the 2nd with the rest 3 and so on.
//for each document, except for the last one
for (int k = 0; k < collect.size() - 1; k++) {
//for each line i in the current document k
for (int i = 0; i < collect.get(k).size(); i++) {
//for each document m after k
for (int m = k + 1; m < collect.size(); m++) {
//for each line j in document m
for (int j = 0; j < collect.get(m).size(); j++) {
//do your stuff by comparing
//collect.get(k).get(i).getSentence()
//WITH
//collect.get(m).get(j).getSentence()
}
}
}
}