Search code examples
javatreeset

Recording location when adding to a TreeSet


I have a List with about 20,000,000 entries. About 5,000,000 entries are unique. I need to iterate over my List, identify unique entries, and assign each an integer between 0 and 5,000,000.

Currently, I sequentially add each entry to a TreeSet, then figure out where it went using .headSet(). I imagine this is suboptimal.

    while((nextline = wholefile.listIterator().next()) != null){

        //sorted, unique, addition
        keywords.add(nextline);

        //hmmm, get index of element in TreeSet?
        k_j = keywords.headSet(nextline).size();

    }

Is there a way to get the location when I call .add() ?


Solution

  • I would do as follows:

    1. Count the objects by populating a Map<YourObject, Integer>.
    2. Go through this map and assign a sequence number to each key which maps to the value 1.

    In code...

    List<String> keywords = Arrays.asList("a", "b", "c", "a");
    
    Map<String, Integer> counts = new HashMap<String, Integer>();
    for (String str : keywords) {
        if (!counts.containsKey(str))
            counts.put(str, 0);
    
        counts.put(str, counts.get(str) + 1);
    }
    
    int seq = 0;
    for (String keyword : counts.keySet())
        if (counts.get(keyword) == 1)                      // is unique?
            System.out.println(keyword + " -> " + seq++);  // assign id.