I have a List with about 20,000,000 entries. About 5,000,000 entries are unique. I need to iterate over my List, identify unique entries, and assign each an integer between 0 and 5,000,000.
Currently, I sequentially add each entry to a TreeSet, then figure out where it went using .headSet(). I imagine this is suboptimal.
while((nextline = wholefile.listIterator().next()) != null){
//sorted, unique, addition
keywords.add(nextline);
//hmmm, get index of element in TreeSet?
k_j = keywords.headSet(nextline).size();
}
Is there a way to get the location when I call .add() ?
I would do as follows:
Map<YourObject, Integer>
.In code...
List<String> keywords = Arrays.asList("a", "b", "c", "a");
Map<String, Integer> counts = new HashMap<String, Integer>();
for (String str : keywords) {
if (!counts.containsKey(str))
counts.put(str, 0);
counts.put(str, counts.get(str) + 1);
}
int seq = 0;
for (String keyword : counts.keySet())
if (counts.get(keyword) == 1) // is unique?
System.out.println(keyword + " -> " + seq++); // assign id.