Search code examples
awkposixsemantics

Delete elements during loop awk semantics


If we assume the loop returns k==0 first (this order is implementation dependent according to the spec). How many times should the loop body run? Once or twice? If twice what should be printed for arr[1]?

BEGIN {
  arr[0] = "zero"; 
  arr[1] = "one"; 
  for (k in arr) { 
      print "key " k " val " arr[k]; 
      delete arr[k+1]  
  }
}
$ gawk --version
GNU Awk 5.1.0, API: 3.0 (GNU MPFR 4.1.0, GNU MP 6.2.1)
....
$ gawk 'BEGIN { arr[0] = "zero"; arr[1] = "one"; for (k in arr) { print "key " k " val " arr[k]; delete arr[k+1]  } }'
key 0 val zero
key 1 val
$ goawk --version
v1.19.0
$ goawk 'BEGIN { arr[0] = "zero"; arr[1] = "one"; for (k in arr) { print "key " k " val " 
key 0 val zero

gnu-awk runs it twice with arr[1] == "" and goawk runs it once. Mawk (mawk 1.3.4 20200120) sorts keys 1,0 but has the same fundamental behavior as gnu-awk, looping twice and print the empty string for the deleted key). What is the posix defined expected behavior of this program?

Essentially should keys deleted in past loops appear in future loops?


Solution

  • According to the POSIX spec:

    The results of adding new elements to array within such a for loop are undefined

    but it doesn't define what happens if you delete them other than:

    The delete statement shall remove an individual array element

    However, according to the GNU AWK manual:

    As a point of information, gawk sets up the list of elements to be iterated over before the loop starts, and does not change it. But not all awk versions do so.

    so the behavior is undefined by POSIX, defined for GNU AWK, and you'd have to check the man page for every other AWK to see what it does.

    Decide which behavior you want and then to get that behavior robustly and portably in all awks you could write whichever one of these you want:

    1. gawks behavior:
    BEGIN {
      arr[0] = "zero"; 
      arr[1] = "one"; 
      for (k in arr) { 
          indices[k]
      }
      for (k in indices) { 
          print "key " k " val " arr[k]; 
          delete arr[k+1]  
      }
    }
    
    1. goawks apparent behavior from your example:
    BEGIN {
      arr[0] = "zero"; 
      arr[1] = "one"; 
      for ( k in arr ) {
          indices[k]
      }
      for (k in indices) {
          if ( k in arr ) {
              print "key " k " val " arr[k]; 
              delete arr[k+1]  
          }
      }
    }
    

    Notes on your code in general:

    1. for ( k in ... ) could visit the indices in any order so relying on delete arr[k+1] to delete an element of arr[] isn't robust as, for example, you might be trying to delete an index past the end of the array on your first iteration through the loop if in decides to start with k set to the last index in the array.
    2. All builtin and generated awk arrays, fields, and strings start at index 1, not 0, so don't create your own arrays starting at 0, start them at 1 to avoid having to remember which type of array it is when writing code to visit the indices and inevitably tripping over that difference at some point.