I have a list containing 305899 Strings (which is the username for a website). After I remove all the duplicates, the number goes down to 172123 Strings.
I want to find how many times a particular String (the username) is repeated in that ArrayList. I wrote a simple bubble sort type logic but it was too slow.
private static Map<String, Integer> findNumberOfPosts(List<String> userNameList) {
Map<String, Integer> numberOfPosts = new HashMap<String, Integer>();
int duplicate = 0;
int size = userNameList.size();
for (int i = 0; i < size - 1; i++) {
duplicate = 0;
for (int j = i + 1; j < size; j++) {
if (userNameList.get(i).equals(userNameList.get(j))) {
duplicate++;
userNameList.remove(j);
j--;
size--;
}
}
numberOfPosts.put(userNameList.get(i), duplicate);
}
return numberOfPosts;
}
Then I changed it to this:
private static Map<String, Integer> findNumberOfPosts(List<String> userNameList) {
Map<String, Integer> numberOfPosts = new HashMap<String, Integer>();
Set<String> unique = new HashSet<String>(userNameList);
for (String key : unique) {
numberOfPosts.put(key, Collections.frequency(userNameList, key));
}
return numberOfPosts;
}
This was really slow as well. When I mean slow, it would take like 30+ minutes to through the list.
Is there any other efficient way to handle this problem? Just reduce the time it takes to find and count duplicate elements?
Your findNumberOfPosts
method is on the right track, but your implementation is doing loads of unnecessary work.
Try this:
private static Map<String, Integer> findNumberOfPosts(List<String> userNameList) {
Map<String, Integer> numberOfPosts = new HashMap<String, Integer>();
for (String userName : userNameList) {
Integer count = numberOfPosts.get(userName);
numberOfPosts.put(userName, count == null ? 1 : ++count);
}
return numberOfPosts;
}
This should execute in a couple of seconds on most machines.