How to convert String System Object into a Byte System Array in powershell?

I would like to create a binary blob from a binary character string, in the same way as when reading in a binary blob from a file, into a buffer, using .NET file stream. Then I would like to read 2 bytes from a particular offset in blob.

I create a file like this:

echo "AAAABBBB" > .\zzblob.txt 
$bytes = "AAAABBBB`r`n"
$aa = [system.bitconverter]::touint16($bytes, 0)

# FAIL!

# Checking the type:
$bytes.GetType() | select Name, BaseType | ft -HideTableHeaders

# String System.Object

Now, doing the same using a stream buffer, we get something else.

$fp = ".\zzblob.txt"
$bf = (new-object byte[](256))
$sp = New-Object System.IO.FileStream($fp, [System.IO.FileMode]::Open, [System.IO.FileAccess]::Read)
$sp.Length
$sp.Read($bf, 0, 256)
$sp.close()

$aa = [system.bitconverter]::touint16($bf, 2)   # ..AA
d2h $aa
# 0x4141  ## OK!

# Checking type:
$bf.GetType() | select Name, BaseType | ft -HideTableHeaders

# Byte[] System.Array

How can I convert a string from String System.Object to Byte[] System.Array?

Solution

You can not convert an arbitrary .NET string to bytes without choosing a specific character encoding that should be applied to it.
- The reasons is that different character encodings use different byte representations of characters, notably with respect to the number of bytes required to encode a single characters, which can even vary from character to character, as is the case with
  UTF-8.
- Whoever must interpret the resulting byte array as a string again must then use the same encoding for de-coding.

If all the characters in a given string happen to fall into the 8-bit subrange of Unicode code points, i.e. the 256 characters occupying the Unicode code points from 0x0 to 0xFF (255) (in Unicode terms: U+0000 to U+00FF), you can use a shortcut, assuming that you want to use the Unicode code points as byte values:

Use [byte[]] [char[]] $string (or [byte[]] $string.ToCharArray()), as also shown in js2010's answer:

 $string = 'AAAABBBB'

 # Convert TO a byte array.
 $byteArray = [byte[]] [char[]] $string
 # OR:
 #    $byteArray = [byte[]] $string.ToCharArray()

 # Convert back FROM a byte array.
 [string]::new($byteArray) # [char[]] cast optional
 # OR, more PowerShell-idiomatically, but less efficiently:
 #    -join [char[]] $byteArray

Caveat: Any character outside that range, i.e. one with a code point of U+0100 (256) or above, e.g. € (EURO SIGN, U+20AC), breaks this approach, because its code point is by definition too large to fit into a [byte] instance:
```
 # -> ERROR: 
 #   Cannot convert value "€" to type "System.Byte". 
 #   Error: "Value was either too large or too small for an unsigned byte."
 [byte[]] [char[]] '€'
```

This approach is tantamount to choosing the fixed-width, single-byte ISO-88591 character encoding for the byte representation, because the 8-bit subrange of Unicode coincides with this encoding.
- That is, the equivalents of the above operations are (note that in PowerShell (Core) 7 you can more simpy use [Text.Encoding]::Latin1 in lieu of [Text.Encoding]::GetEncoding(28591)):
```
 $string = 'AAAABBBB'

 # Convert TO a byte array.
 $byteArray = [Text.Encoding]::GetEncoding(28591).GetBytes($string)    
 # Equivalent of:
 #    $byteArray = [byte[]] [char[]] $string

 # Convert back FROM a byte array.
 [Text.Encoding]::GetEncoding(28591).GetString($byteArray)
 # Equivalent of:
 #    [string]::new($byteArray)
```

As for writing the byte representations to a file:

If you have an in-memory byte representation, it is safest to write to and read from files as bytes rather than via a character encoding:

  $string = 'AAAABBBB'
  $byteArray = [byte[]] [char[]] $string

  # NOTE: Sadly, the syntax for requesting byte processing differs
  #       between Windows PowerShell and PowerShell 7
  #       (-Encoding Byte vs. -AsByteStream), so we construct an
  #       an edition-specific hashtable to be used for splatting below.
  $encodingArg = if ($IsCoreClr) { @{ AsByteStream = $true } } 
                 else            { @{ Encoding = 'Byte' } }

  # WRITE the byte array to a file.
  Set-Content blob.txt @encodingArg -Value $byteArray 

  # READ the byte array from a file, as such.
  # Note: -Raw -ReadCount 0 reads the entire file *at once* into
  #       a [byte[]] array.
  $byteArrayFromFile = 
    Get-Content blob.txt @encodingArg -Raw -ReadCount 0

Alternatively, in PowerShell (Core) 7, you can use -Encoding Latin1^[1] with Set-Content and Get-Content to directly write and read 8-bit-Unicode range strings, but that doesn't work in Windows PowerShell, where you'd have to use .NET APIs directly.

^{[1] The ISO-88591 encoding that -Encoding Latin1 refers to is closely related to, but not identical to Windows-1252, so using the latter - which -Encoding Default may refer to in Windows PowerShell, depending on the system locale (e.g. on US-English and Western European machines) - is not an option.}