Search code examples
javahashmaphashset

why I will get a guranteed sorted order, if I put all [1, 2, 3, ..., n] into a HashSet with any shuffled order and iterate the HashSet?


PS: How is this HashSet producing sorted output? this post doesn't answer my question. I know that if I put any numbers into hashset, I will not get sorted order.

However, I found that if I put all [1, 2, 3, ..., n] into a HashSet with any shuffled order and iterate the HashSet, I will get a guranteed sorted order. I cannot understand why it will always happen. I've tested any n < 10000 for many times, it's always true, therefore it should not be a coincidence and it should have some reason! Even though I should not rely on this implement details, please tell me why it always happens.

PS: I know that if I insert [0,1,2, ..., n-1], or [1+k, 2+k, .., n+k] (k != 0) into HashSet, the iteration order is unsorted and I've tested. It's normal that iteration order of HashSet is unsorted. However, why any insertion order of [1,2,3,4,..,n] is accidentally always true? I've checked the implementation details. If I track the path, the whole process will inculde the resizing the bucket array, and transformation from linkedlist to red-black tree. If I insert the whole [1-n] in shuffled order, the intermediate status of the HashSet is unsorted. However it will accidentally have sorted order, if I complete the all insertions.

I used the JDK 1.8 to do the following test.

public class Test {

    public static void main(String[] args) throws IOException {
        List<Integer> res = printUnsortedCase(10000);
        System.out.println(res);
    }


    private static List<Integer> printUnsortedCase(int n){
        List<Integer> res = new ArrayList<>();
        for (int i = 2; i < n; i++) {
            if (!checkSize(i)) {
                res.add(i);
            }
        }
        return res;
    }

    private static boolean checkSize(int n) {
        List<Integer> list = new ArrayList<>();
        for (int i = 0; i < n; i++) {
            list.add(i);
        }
 
        // here I've shuffled the list of [1,2,3,4, ...n]        
        Collections.shuffle(list);

        Set<Integer> set = new HashSet<>();
        for (int i = 0; i < n; i++) {
            set.add(list.get(i)); // I insert the set in an unsorted order of [1,2,3,..,n]
        }

        list = new ArrayList<>(set);// iterate over the HashSet and insert into ArrayList
        return isSorted(list);
    }

    private static boolean isSorted(List<Integer> list) {
        for (int i = 1; i < list.size(); i++) {
            if (list.get(i - 1) > list.get(i)) return false;
        }
        return true;
    }
}

I've wrote the above checking code and it seems true.


Solution

  • You are conflating two related concepts:

    1. guaranteed order: the specification says that you will get the elements back in a specific order and all implementations conforming to that spec will do so.
    2. reproducible order: a specific implementation returns all the elements back in a specific order.

    Guaranteed order necessarily implies reproducible order (otherwise you'd have a bug).

    Reproducible order doesn't imply guaranteed order. It's possible that the reproducible order is just a side effect of some implementation details that happens to align so that you get the elements in the same order under some circumstances, but this isn't guaranteed.

    In this specific case several factors together result in a reproducible order:

    • Integer has a highly reproducible and predictable hashCode (it's just the number itself)
    • HashMap does some minor manipulation on that hash code to decrease the chances of collisions by simple hash code implementations, which doesn't matter in this case (because it just does hash ^ (hash >>> 16) which keeps number <= 216 equally-sorted).
    • You use a very consistent and reproducible way to construct your HashMaps. The resulting hashmaps will always have gone through the same growing stages.

    If instead of

            list.add(i);
    

    you did

            list.add(i + 65000);
    

    (i.e. use the number 65000 to 65000+n instead of 0 to n) then you'd see the non-sorted results emerge.

    In fact the "reproducible order" that you get is so fragile that just adding 10 already causes some of the lists to be unsorted.