I have a scenario where I need to obtain an installer embedded within a JSON REST response that is base64-encoded. Since the size of the JSON string is rather large (180 MB), it causes problems when decoding the REST response using standard PowerShell tooling as it causes OutOfMemoryException
to be thrown quite often in limited memory scenarios (such as hitting WinRM memory quotas).
It's not desirable to raise the memory quota in our environment over a single installation, and we don't have standard tooling to prepare a package whose payload does not exist at a simple HTTP endpoint (I don't have direct permissions to publish packages not performed through our build system). My solution in this case is to decode the base64 string in chunks. However, while I have this working, I am stuck on one last bit of optimization for this process.
Currently I am using a MemoryStream
to read from the string, but I need to provide a byte[]
:
# $Base64String is a [ref] type
$memStream = [IO.MemoryStream]::new([Text.Encoding]::UTF8.GetBytes($Base64String.Value))
This unsurprisingly results in copying the byte[]
representation of the entire base64-encoded string, and is even less memory-efficient than built-in tooling in its current form. The code you don't see here reads from $memStream
in chunks of 1024
bytes at a time, decoding the base64 string and writing the bytes to disk using BinaryWriter
. This all works well, if slow since I'm forcing garbage collection fairly often. However, I want to extend this byte-counting to the initial MemoryStream
and only read n
bytes from the string at a time. My understanding is that base64 strings must be decoded in chunks of bytes divisible by 4.
The problem is that [string].Substring([int], [int])
works based on string length, not number of bytes per character. The JSON response can be assumed to be UTF-8 encoded, but even with this assumption UTF-8 characters vary between 1-4 bytes in length. How can I (directly or indirectly) substring a specific number of bytes in PowerShell so I can create the MemoryStream
from this substring instead of the full $Base64String
?
I will note that I have explored the use of the [Text.Encoding].GetBytes([string], [int], [int])
overload, however, I face the same issue in that the method expects a character count, not byte count, for the length of the string to get the byte[]
for from the starting index.
To answer the base question "How can I substring a specific number of bytes from a string in PowerShell", I was able to write the following function:
function Get-SubstringByByteCount {
[CmdletBinding()]
Param(
[Parameter(Mandatory)]
[ValidateScript({ $null -ne $_ -and $_.Value -is [string] })]
[ref]$InputString,
[int]$FromIndex = 0,
[Parameter(Mandatory)]
[int]$ByteCount,
[ValidateScript({ [Text.Encoding]::$_ })]
[string]$Encoding = 'UTF8'
)
[long]$byteCounter = 0
[System.Text.StringBuilder]$sb = New-Object System.Text.StringBuilder $ByteCount
try {
while ( $byteCounter -lt $ByteCount -and $i -lt $InputString.Value.Length ) {
[char]$char = $InputString.Value[$i++]
[void]$sb.Append($char)
$byteCounter += [Text.Encoding]::$Encoding.GetByteCount($char)
}
$sb.ToString()
} finally {
if( $sb ) {
$sb = $null
[System.GC]::Collect()
}
}
}
Invocation works like so:
Get-SubstringByByteCount -InputString ( [ref]$someString ) -ByteCount 8
Some notes on this implementation:
[ref]
type since the original goal was to avoid copying the full string in a limited-memory scenario. This function could be re-implemented using the [string]
type instead.StringBuilder
until the specified number of bytes has been written.[Text.Encoding]::GetByteCount
overloads. Encoding can be specified via a parameter, but the encoding value should match one of the static encoding properties available from [Text.Encoding]
. Defaults to UTF8
as written.$sb = $null
and [System.GC]::Collect()
are intended to forcibly clean up the StringBuilder
in a memory-constrained environment, but could be omitted if this is not a concern.-FromIndex
takes the start position within -InputString
to begin the substring operation from. Defaults to 0
to evaluate from the start of the -InputString
.