I have a code, which can count word occurences in a file. I would like to use this with 2 files and display recurrent(which both files contains) words in a separated table. What is your idea, how is it possible to use it with 2 files?
while ((inputLine = bufferedReader.readLine()) != null) {
String[] words = inputLine.split("[ \n\t\r.,;:!?(){}]");
for (int counter = 0; counter < words.length; counter++) {
String key = words[counter].toLowerCase();
if (key.length() > 0) {
if (crunchifyMap.get(key) == null) {
crunchifyMap.put(key, 1);
} else {
int value = crunchifyMap.get(key).intValue();
value++;
crunchifyMap.put(key, value);
}
}
}
}
Set<Map.Entry<String, Integer>> entrySet = crunchifyMap.entrySet();
System.out.println("Words" + "\t\t" + "# of Occurances");
for (Map.Entry<String, Integer> entry : entrySet) {
System.out.println(entry.getKey() + "\t\t" + entry.getValue());
}
You should probably use the following (very coarse) algorithm:
Set words
;Set words2
;words
that are also contained in words2
: words.retainAll(words2)
words
contains your final list.Note that you can reuse the file-reading algorithm if you put it into a method like
public Set<String> readWords(Reader reader) {
....
}
Count frequency of occurence
If you also want to know the frequency of occurence, you should read each file into a Map<String, Integer>
which maps each word to its frequency of occurence within that file.
The new Map.merge(...)
function (since Java 8) simplifies counting:
Map<String, Integer> freq = new HashMap<>();
for(String word : words) {
// insert 1 or increment already mapped value
freq.merge(word, 1, Integer::sum);
}
Then apply the following, slightly modified algorithm:
Map wordsFreq1
;Map wordsFreq2
;Set<String> words = wordsFreq1.keySet()
words.retainAll(wordsFreq2.keySet())
words
contains all the words in common, and wordsFreq1
and wordsFreq2
the frequencies of all words of both files.With these three data structures, you can easily get all information you want. Example:
Map<String, Integer> wordsFreq1 = ... // read from file
Map<String, Integer> wordsFreq2 = ... // read from file
Set<String> commonWords = new HashSet<>(wordsFreq1.keySet());
commonWords.retainAll(wordsFreq2.keySet());
// Map that contains the summarized frequencies of all words
Map<String, Integer> allWordsTotalFreq = new HashMap<>(wordsFreq1);
wordsFreq2.forEach((word, freq) -> allWordsTotalFreq.merge(word, freq, Integer::sum));
// Map that contains the summarized frequencies of words in common
Map<String, Integer> commonWordsTotalFreq = new HashMap<>(allWordsTotalFreq);
commonWordsTotalFreq.keySet().retainAll(commonWords);
// List of common words sorted by frequency:
List<String> list = new ArrayList<>(commonWords);
Collections.sort(list, Comparator.comparingInt(commonWordsTotalFreq::get).reversed());