this is my third day with Java (beginner coder in general) and I am finding trouble with getting the desired output I need. I am trying to find the frequency of words occurring in a string or text file. My whole program works so far except I am having difficulty with outputting the result from most frequent words to less; furthermore how can I limit it to the top x most used words for example.
Here is my code so far:
public static void wordOccurrence(String text) {
String[] wordSplit = text.split(" ");
for (int i = 0; i < wordSplit.length; i++) {
Map<String, Integer> occurrence = new TreeMap<>(Collections.reverseOrder());
int Counter = 0;
for (int j = 0; j < wordSplit.length; j++) {
if (wordSplit[i].equals(wordSplit[j])) {
if (j < i)
break;
Counter++;
occurrence.put(wordSplit[j],Counter);
}
}
if (Counter > 1)
System.out.println(occurrence);
}
}
and here is my output which is unordered:{The=2}{that=2}{to=2}{and=5}{for=2}{as=2}
You are using TreeMap
to sort your entries. TreeMap
sorts entries by key, not value.
You can use streams
and LinkedHashMap
for that job:
public static void wordOccurrence(String text) {
String[] wordSplit = text.split(" ");
Map<String, Long> map = Arrays.stream(wordSplit)
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
List<Entry<String, Long>> list = new ArrayList<>(map.entrySet());
list.sort(Entry.comparingByValue(Comparator.reverseOrder()));
Map<String, Long> occurrence = list.stream()
.collect(Collectors.toMap(Entry::getKey, Entry::getValue, (s1, s2) -> s1, LinkedHashMap::new));
occurrence.entrySet().forEach(entry -> System.out.println(entry.getKey()+";"+entry.getValue()));
}
Or whithout using List
:
public static void wordOccurrence(String text) {
String[] wordSplit = text.split(" ");
Map<String, Long> map = Arrays.stream(wordSplit)
.collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));
Map<String, Long> occurrence = map.entrySet().stream()
.sorted(Collections.reverseOrder(Map.Entry.comparingByValue()))
.collect(Collectors.toMap(Entry::getKey, Entry::getValue, (s1, s2) -> s1, LinkedHashMap::new));
occurrence.entrySet().forEach(entry -> System.out.println(entry.getKey()+";"+entry.getValue()));
}
If you just want the top "n" you can add a line with .limit(n)
:
Map<String, Long> occurrence = map.entrySet().stream()
.sorted(Collections.reverseOrder(Map.Entry.comparingByValue()))
.limit(5)
.collect(Collectors.toMap(Entry::getKey, Entry::getValue, (s1, s2) -> s1, LinkedHashMap::new));