Search code examples
powershelltypescastingtype-conversion

Powershell: How can I disable automatic int32 type conversions on math with integral numbers?


This is an example, the real thing is a large New-Object 'object[,]' 2D array:

PS C:\> $a=[int16]10
PS C:\> $a.GetType().Name
Int16
PS C:\> $a++
PS C:\> $a.GetType().Name
Int32

PS C:\> $a=[int16]10
PS C:\> $a.GetType().Name
Int16
PS C:\> $a=$a+[int16]1
PS C:\> $a.GetType().Name
Int32

I can FORCE the type in such things, but this is actually a conversion after the math is done and slows down the the complete code by ~ 10%, so it is useless:

PS C:\> $a=[int16]10
PS C:\> $a.GetType().Name
Int16
PS C:\> $a=[int16]($a+1)
PS C:\> $a.GetType().Name
Int16

Is there no way to prevent this auto-conversion from byte and int16 to it's beloved int32 when using any math?

As for the comment of @mklement0 I extend the examples to the use case. Note that no value anywhere on that board exceeds "8":

# Init board
$BoardXSize = [int]100
$BoardYSize = [int]100
$Board = New-Object 'object[,]' $BoardXSize,$BoardYSize
for ($y=0;$y -lt $BoardYSize;$y++) {
    for ($x=0;$x -lt $BoardXSize;$x++) {
        [int16]$Board[$x,$y] = 0
    }
}

Testing:

PS C:\> ($Board[49,49]).GetType().Name
Int16
PS C:\> $Board[49,49]++
PS C:\> ($Board[49,49]).GetType().Name
Int32

PS C:\> [int16]$Board[49,49]=0
PS C:\> ($Board[49,49]).GetType().Name
Int16
PS C:\> [int16]$Board[49,49]++
0
PS C:\> ($Board[49,49]).GetType().Name
Int32

PS C:\> [int16]$Board[49,49]=0
PS C:\> ($Board[49,49]).GetType().Name
Int16
PS C:\> $Board[49,49] += [int16]1
PS C:\> ($Board[49,49]).GetType().Name
Int32

PS C:\> [byte]$Board[49,49]=0
PS C:\> ($Board[49,49]).GetType().Name
Byte
PS C:\> $Board[49,49] += [byte]1
PS C:\> ($Board[49,49]).GetType().Name
Int32

However when being EXTREMELY PICKY about the operators it works, just "++" seems to fail:

PS C:\> [int16]$Board[49,49]=0
PS C:\> ($Board[49,49]).GetType().Name
Int16
PS C:\> [int16]$Board[49,49] += [int16]1
PS C:\> ($Board[49,49]).GetType().Name
Int16

PS C:\> [byte]$Board[49,49]=0
PS C:\> ($Board[49,49]).GetType().Name
Byte
PS C:\> [byte]$Board[49,49] += [byte]1
PS C:\> ($Board[49,49]).GetType().Name
Byte

But since this is always a cast after the math was done it does not improve speed. But it is solved.


Solution

  • You can type-constrain your variable, by placing the type literal to the left of the (initial) assignment:

    # Note how [int16] is to the *left* of $a = ...
    PS> [int16] $a = 10; ++$a; $a += 1; $a, $a.GetType().Name
    12
    Int16
    

    Note that this works differently than typing a variable in a statically typed language such as C#, for example: Instead of locking in the specified type statically, a cast is performed on every assignment in order to coerce the value being assigned to that type.

    For instance, with [int16] $a = 10 having been the original assignment, $a += 1 is effectively the same as assigning [int16] ($a + 1).

    This also implies that you're free to assign values of any type, as long as they can be converted to the constraining type; e.g.:

    # These work, because they are implicitly cast (converted) to [int16]
    $a += 1.2
    $a += '42'
    

    See the next section about arrays.

    Applying type constraints to arrays:

    To a (one-dimensional) array:

    # Strongly types the array as storing [int16] instances
    # *and* type-constrains the $arr variable.
    PS> [int16[]] $arr = 10, 11; ++$arr[0]; $arr[1] += 1; $arr; $arr.ForEach('GetType').Name
    11
    12
    Int16
    Int16
    

    To a two-dimensional array (rare in PowerShell):

    # Strongly types the 2D array as storing [int16] instances,
    # but does *not* type-constrain the $arr2d *variable.
    PS> $countDim1, $countDim2 = 2, 3;
        $arr2d = [int16[,]]::new($countDim1, $countDim2);
        ++$arr2d[0,0]; $arr2d[0,0].GetType().Name
    Int16
    

    Note that this is truly creates strongly, statically typed .NET arrays, but assigning to their elements again applies PowerShell's flexible type conversions (e.g., $arr[0] = '42' works fine).

    Also note that the 2D array example, for brevity, doesn't actually type-constrain the data type of the variable, given that there's no type literal to the left of the assignment; therefore, you could still replace the entire array with an arbitrary new value of any type (e.g., $arr2d = 'foo'); to perform type-constraining too, you'd have to do:

    # Creates a 2D [int16] array *and* type-constrains $arr2D to
    # such arrays.
    [int16[,]] $arr2d = [int16[,]]::new($countDim1, $countDim2)
    

    Finally, note that you cannot type-constrain individual elements of an array or (writable) properties of an object; in other words: only L-values that are PowerShell variables can be type-constrained;[1] in all other cases a simple ad hoc cast is performed; e.g., given an [object[]] array $a = 1, 2, the following two statements are equivalent:

    # Cast
    $a[0] = [int] 42 
    
    # !! A mere cast too, because the assignment target is an 
    # !! *array element*, which cannot be type-constrained.
    [int] $a[0] = 42
    

    Also unlike languages such as C#, PowerShell automatically widens types in numerical operations.

    This widening apples not just to operands of differing (numeric) types, but also if the result would be too large to fit into the (larger of the two) input type(s), and it has two surprising aspects:

    • A minimum width of [int] (System.Int32) is applied, which means that types smaller than [int] are always widened to [int] in the result, even if it isn't necessary, as you've observed:

      # Implicitly widens to [int], even though not necessary.
      # Note that the [byte] cast is only applied to the LHS
      PS> ([byte] 42 + 1).GetType().Name
      Int32
      
    • If the result is too large to fit into the (larger of the two) input type(s), the result is invariably widened to a [double] (System.Double)(!)

      # Implicitly widens to [double](!), because adding 1
      # to the max. [int] value doesn't fit into an [int].
      PS> ([int]::MaxValue + 1).GetType().Name
      Double
      
      # NO widening, because the large input type is a [long] (System.Int64)
      # (suffix L), so the result fits into a [long] too.
      PS> ([long] [int]::MaxValue + 1).GetType().Name
      Int64
      
    • This immediate widening to [double] in expressions, without considering larger integer types, differs from how number literals are parsed in PowerShell, where larger integer types are used (but note that PowerShell never chooses unsigned types):

      # 2147483648 = [int]::MaxValue + 1, so [long] (System.Int64)
      # is chose as the literal's type. 
      (2147483648).GetType().Name
      In64
      
      • Note that number literals can also have type suffixes to explicitly designate their type, such as l for [long] and d for [decimal]; e.g., 2147483648d.GetType().Name returns Decimal.

    Optional reading: Performance of accessing array elements in strongly typed vs. untyped ([object[]]) arrays:

    PowerShell's regular array type, as created by , and the @(...) operator, for instance, is [object[]], i.e. an "untyped" array in that [object], as the root of the .NET type hierarchy, can store values of any data type - even allowing for mixing of different types in a given, single array.

    While this affords great flexibility, it affects:

    • Type safety: Depending on the use case, ensuring that all elements have the same type may be required; if so, use a strongly typed array.

    • Memory efficiency: Value-type instances must be wrapped in [object] instances for storage, a process known as boxing, and, on access, requires unwrapping, known as unboxing.

    • Runtime performance: Surprisingly, unlike what the overhead of boxing and unboxing suggests, there is little to no performance benefit to strong typing, and with 2D arrays performance can even suffer.

    Below are the results of benchmarking array get and set access, comparing 1D and 2D arrays with various value types vs. object elements:

    • 15 runs were averaged, with 1-million-element 1D arrays and 1000 x 1000 2D arrays.

    • The benchmark source code is further below.

      • To run the code yourself, you need to first download and define function Time-Command from this Gist.

      • Assuming you have looked at the linked Gist's source code to ensure that it is safe (which I can personally assure you of, but you should always check), you can install it directly as follows:

        irm https://gist.github.com/mklement0/9e1f13978620b09ab2d15da5535d1b27/raw/Time-Command.ps1 | iex
        
      • Benchmarking in PowerShell is far from an exact science; since the benchmarks are wall clock-based, it is best to average multiple runs and to ensure that the system isn't (too) busy running other things.

    • The absolute timings will vary based on many factors, but the Factor column should provide a sense of relative performance - differences in the 2nd decimal places are likely to be incidental (e.g., 1.00 vs. 1.03) and such rankings may swap places in repeated invocations.

    • Some conclusions from the results below, which were run on PowerShell (Core) 7.2.2 on a Windows 10 machine - Windows PowerShell performance characteristics may differ:

      • 1D arrays (by far the most common type of array in PowerShell):

        • Getting is marginally faster if strongly typed.
        • Updating is noticeably faster if strongly typed.
      • 2D arrays (rare in PowerShell):

        • Getting is somewhat slower if strongly typed.
        • Updating is somewhat faster if strongly typed.
      • Note: While there are no Windows PowerShell results below, in my tests it seems that the speed of array access roughly doubled in PowerShell (Core) 7+.

    # Sample results on a Windows 10 machine running PowerShell 7.2.2
    
    ============== 1D: GET performance (avg. of 15 runs)
    
    Factor Secs (15-run avg.) Command
    ------ ------------------ -------
    1.00   0.482              # [byte[]]…
    1.00   0.483              # [int[]] (Int32)…
    1.01   0.487              # [long[]]…
    1.02   0.492              # [int16[]]…
    1.08   0.521              # [decimal[]]…
    1.09   0.526              # [object[]]…
    1.11   0.533              # [double[]]…
    1.12   0.537              # [datetime[]]…
    
    ============== 1D: SET (INCREMENT) performance (avg. of 15 runs)
    
    Factor Secs (15-run avg.) Command
    ------ ------------------ -------
    1.00   0.526              # [double[]]…
    1.03   0.541              # [int[]] (Int32)…
    1.11   0.584              # [long[]]…
    1.30   0.681              # [byte[]]…
    1.40   0.738              # [int16[]]…
    1.57   0.826              # [decimal[]]…
    1.86   0.975              # [object[]], no casts…
    2.90   1.523              # [object[]] with [int] casts…
    7.34   3.857              # [datetime[]]…
    
    ============== 2D: GET performance (avg. of 15 runs)
    
    Factor Secs (15-run avg.) Command
    ------ ------------------ -------
    1.00   0.620              # [object[,]]…
    1.13   0.701              # [double[,]]…
    1.13   0.702              # [int[,]] (Int32)…
    1.18   0.731              # [datetime[,]]…
    1.20   0.747              # [long[,]]…
    1.23   0.764              # [byte[,]]…
    1.26   0.782              # [decimal[,]]…
    1.34   0.828              # [int16[,]]…
    
    ============== 2D: SET (INCREMENT) performance (avg. of 15 runs)
    
    Factor Secs (15-run avg.) Command
    ------ ------------------ -------
    1.00   0.891              # [double[,]]…
    1.02   0.904              # [long[,]]…
    1.10   0.977              # [byte[,]]…
    1.24   1.107              # [object[,]], no casts…
    1.27   1.131              # [int16[,]]…
    1.28   1.143              # [int[,]] (Int32)…
    1.46   1.300              # [decimal[,]]…
    1.72   1.530              # [object[,]] with [int] casts…
    5.25   4.679              # [datetime[,]]…
    

    Benchmark source code:

    $runs = 15 # how many run to average.
    $d1 = $d2 = 1000  # 1000 x 1000 2D array
    $d = $d1 * $d2    # 1 million-element 1D array
    
    # index arrays for looping
    $indices_d1 = 0..($d1 - 1)
    $indices_d2 = 0..($d2 - 1)
    $indices = 0..($d - 1)
    
    # 1D arrays.
    $ao = [object[]]::new($d)
    $ab = [byte[]]::new($d)
    $ai16 = [int16[]]::new($d)
    $ai = [int[]]::new($d)
    $al = [long[]]::new($d)
    $adec = [decimal[]]::new($d)
    $ad = [double[]]::new($d)
    $adt = [datetime[]]::new($d)
    
    # 2D arrays.
    $ao_2d = [object[,]]::new($d1, $d2)
    $ab_2d = [byte[,]]::new($d1, $d2)
    $ai16_2d = [int16[,]]::new($d1, $d2)
    $ai_2d = [int[,]]::new($d1, $d2)
    $al_2d = [long[,]]::new($d1, $d2)
    $adec_2d = [decimal[,]]::new($d1, $d2)
    $ad_2d = [double[,]]::new($d1, $d2)
    $adt_2d = [datetime[,]]::new($d1, $d2)
    
    "`n============== 1D: GET performance (avg. of $runs runs)"
    
    Time-Command -Count $runs `
    { # [object[]]
      foreach ($i in $indices) { $null = $ao[$i] }
    },
    { # [byte[]]
      foreach ($i in $indices) { $null = $ab[$i] }
    }, 
    { # [int16[]]
      foreach ($i in $indices) { $null = $ai16[$i] }
    },
    { # [int[]] (Int32)
      foreach ($i in $indices) { $null = $ai[$i] }
    },
    { # [long[]]
      foreach ($i in $indices) { $null = $al[$i] }
    }, 
    { # [decimal[]]
      foreach ($i in $indices) { $null = $adec[$i] }
    }, 
    { # [double[]]
      foreach ($i in $indices) { $null = $ad[$i] }
    },
    { # [datetime[]]
      foreach ($i in $indices) { $null = $adt[$i] }
    } | Format-Table Factor, Secs*, Command
    
    "============== 1D: SET (INCREMENT) performance (avg. of $runs runs)"
    
    Time-Command -Count $runs `
    { # [object[]], no casts
      foreach ($i in $indices) { ++$ao[$i] }
    },
    { # [object[]] with [int] casts
      foreach ($i in $indices) { $ao[$i] = [int] ($ao[$i] + 1) }
    },
    { # [byte[]]
      foreach ($i in $indices) { ++$ab[$i] }
    }, 
    { # [int16[]]
      foreach ($i in $indices) { ++$ai16[$i] }
    },
    { # [int[]] (Int32)
      foreach ($i in $indices) { ++$ai[$i] }
    },
    { # [long[]]
      foreach ($i in $indices) { ++$al[$i] }
    },
    { # [decimal[]]
      foreach ($i in $indices) { ++$adec[$i] }
    },
    { # [double[]]
      foreach ($i in $indices) { ++$ad[$i] }
    },
    { # [datetime[]]
      foreach ($i in $indices) { $adt[$i] += 1 } # ++ not supported
    } | Format-Table Factor, Secs*, Command
    
    "============== 2D: GET performance (avg. of $runs runs)"
    
    Time-Command -Count $runs `
    { # [object[,]]
      foreach ($i in $indices_d1) {
        foreach ($j in $indices_d2) { $null = $ao_2d[$i, $j] }
      }
    },
    { # [byte[,]]
      foreach ($i in $indices_d1) {
        foreach ($j in $indices_d2) { $null = $ab_2d[$i, $j] } 
      }
    },
    { # [int16[,]]
      foreach ($i in $indices_d1) {
        foreach ($j in $indices_d2) { $null = $ai16_2d[$i, $j] }
      }
    }, 
    { # [int[,]] (Int32)
      foreach ($i in $indices_d1) {
        foreach ($j in $indices_d2) { $null = $ai_2d[$i, $j] }
      }
    },
    { # [long[,]]
      foreach ($i in $indices_d1) {
        foreach ($j in $indices_d2) { $null = $al_2d[$i, $j] } 
      }
    }, 
    { # [decimal[,]]
      foreach ($i in $indices_d1) {
        foreach ($j in $indices_d2) { $null = $adec_2d[$i, $j] } 
      }
    },
    { # [double[,]]
      foreach ($i in $indices_d1) {
        foreach ($j in $indices_d2) { $null = $ad_2d[$i, $j] } 
      }
    }, 
    { # [datetime[,]]
      foreach ($i in $indices_d1) {
        foreach ($j in $indices_d2) { $null = $adt_2d[$i, $j] } 
      }
    } | Format-Table Factor, Secs*, Command
    
    "============== 2D: SET (INCREMENT) performance (avg. of $runs runs)"
    
    Time-Command -Count $runs `
    { # [object[,]], no casts
      foreach ($i in $indices_d1) {
        foreach ($j in $indices_d2) { ++$ao_2d[$i, $j] }
      }
    },
    { # [object[,]] with [int] casts
      foreach ($i in $indices_d1) {
        foreach ($j in $indices_d2) { $ao_2d[$i, $j] = [int] ($ao_2d[$i, $j] + 1) }
      }
    }, 
    { # [byte[,]]
      foreach ($i in $indices_d1) {
        foreach ($j in $indices_d2) { ++$ab_2d[$i, $j] }
      }
    }, 
    { # [int16[,]]
      foreach ($i in $indices_d1) {
        foreach ($j in $indices_d2) { ++$ai16_2d[$i, $j] }
      }
    },
    { # [int[,]] (Int32)
      foreach ($i in $indices_d1) {
        foreach ($j in $indices_d2) { ++$ai_2d[$i, $j] }
      }
    },
    { # [long[,]]
      foreach ($i in $indices_d1) {
        foreach ($j in $indices_d2) { ++$al_2d[$i, $j] }
      }
    }, 
    { # [decimal[,]]
      foreach ($i in $indices_d1) {
        foreach ($j in $indices_d2) { ++$adec_2d[$i, $j] } 
      }
    },
    { # [double[,]]
      foreach ($i in $indices_d1) {
        foreach ($j in $indices_d2) { ++$ad_2d[$i, $j] } 
      }
    },
    { # [datetime[,]]
      foreach ($i in $indices_d1) {
        foreach ($j in $indices_d2) { $adt_2d[$i, $j] += 1 } # ++ not supported
      }
    } | Format-Table Factor, Secs*, Command
    

    [1] The reason is that type-constraining requires attaching a type-conversion attribute to a PowerShell variable object.