Search code examples
arraysbashassociative-array

Why don't Bash associative arrays maintain index order?


I'm creating associative arrays to process in a for loop but i'm getting some strange results in index order. Please take a look at this example script:

#!/bin/bash
declare -A test1=(
    [d]=1w45
    [e]=2dfg
    [m]=3df
    [o]=4df
)

declare -A test2=(
    [d1]=1w45
    [e2]=2dfg
    [m3]=3df
    [o4]=4df
)

declare -A test3=(
    [1d]=1w45
    [2e]=2dfg
    [3m]=3df
    [4o]=4df
)

echo ${!test1[@]}
echo ${!test2[@]}
echo ${!test3[@]}

The output will be

$ ./test 
d e m o
o4 m3 e2 d1
3m 4o 1d 2e

Why is the order of items changing? And how to bypass this behavior? Thanks in advance!


Solution

  • Why don't bash associative arrays maintain index order?

    Because they are designed not to do this.

    Why order of items is changing?

    Bash associative array implementation uses a hash library and stores hashes of indexes. These hashes are stored in buckets with 128 default number of buckets. The hash is calculated with the function hash_string() using a simple multiplication and a bitwise XOR. The keys of the associative array are listed in the order buckets appear. Bucket number is calculated by a bitwise AND operation between the hash value of the key and the number of buckets decreased by 1.

    I compiled bash commit 6c6454cb18d7cd30b3b26d5ba6479431e599f3ed and for me your script outputs:

    $ ./test 
    o m e d
    d1 e2 m3 o4
    1d 3m 2e 4o
    

    So I copied the hash_string() function and written a small C program that would output the bucket number of the keys and compiled and executed:

    #include <stdio.h>
    
    #define FNV_OFFSET 2166136261
    #define FNV_PRIME 16777619
    
    unsigned int
    hash_string (s)
         const char *s;
    {
      register unsigned int i;
    
      for (i = FNV_OFFSET; *s; s++)
        {
          i *= FNV_PRIME;
          i ^= *s;
        }
    
      return i;
    }
    
    int main() {
        const char *s[] = {
            "o", "m", "e", "d",
            "d1", "e2", "m3", "o4",
            "1d", "3m", "2e", "4",
        };
        for (int i = 0;  i < sizeof(s)/sizeof(*s); ++i) {
            printf("%3s %3d\n",
                s[i], 
                hash_string(s[i]) & (128 - 1));
        }
    }
    

    The program outputs two columns, the key and the bucket number of the key (added extra empty lines):

      o 112
      m 114
      e 122
      d 123
    
     d1  16
     e2  60
     m3  69
     o4 100
    
     1d  14
     3m  41
     2e  50
     4o  94
    

    The order of keys outputted is sorted using the order of buckets in the hash table they are into, so they are outputted in that order. This is why the order of items changed.

    That said, you should not rely on this behaviour, as the output order of keys can change if the author of bash decides to change the hashing function or make any other change.

    And how to bypass this behavior?

    There is no way to bypass this. Bash arrays use hash table to store the hashes. The insertion order of keys is not stored anywhere.

    Of course, you can bypass this behaviour by patching bash to implement such functionality that you request.

    That said, I would just use two arrays:

    keys=(d1 e2 m3 o4)
    elements=(1w45 2dfg 3df 4df)
    declare -A test2
    for ((i=0;i<${#keys[@]};++i)); do
        test2[${keys[$i]}]="${elements[$i]}"
    done
    # or maybe something along:
    declare -A test2=($(paste -zd <(printf "[%s]=\0" "${keys[@]}") <(printf "%q \0" "${elements[@]}"))
    

    That way you can iterate over keys in the order you inserted them in a separate keys array.