I am trying to create a rather large array of hashtables, with much of the data either fully randomized or randomly picked from a list.
Here is my Current Code
$ArrayData = @()
$ArrayDataRows = 150000
foreach ($i in 1..$ArrayDataRows) {
$thisobject = [PSCustomObject] @{
Number = $i
Place = Get-Random -InputObject NJ, UT, NY, MI, PA, FL, AL, NM, CA, OK, TX, CO, AZ
Color = Get-Random -InputObject red, yellow, blue, purple, green, white, black
Zone = (Get-Random -InputObject $([char[]](65..90)) -Count 10) -join ""
Group = Get-Random -InputObject @(1..20)
}
$ArrayData += $thisobject
}
What I notice though, is that it seems to not be efficient. It takes 25 mins in total to finish for 150k rows.
I had some additional code not posted here which measured how long it took each instance and estimated the average from it to its predecessors. Initially, it would give me an estimate for 450 secs for the total and 0.002 as an Average per instance for the first 3k items but later it just kept slowly crawling up to 0.016 or 8 times slower as an average.
How can I optimize and/or make this more efficient while achieving the same data as a result?
[edit - you are NOT making an array of hashtables. you are making an array of PSCustomObject
items. [*grin*]]
the standard array is a fixed size object. take a look at $ArrayData.IsFixedSize
for confirmation of that. [grin]
so, when you use +=
on a standard array, powershell makes a NEW, one-item-larger array, copies the old one into the new one, and finally adds the new item. it's fast when the item count & size are "small", but it gets slower [and slower, and slower] as the count/size grows.
there are two common solutions ...
.Add()
methodArrayList
[deprecated], and the Generic.List
are the ones folks usually use. the 1st outputs an index number when you add to it, so even if it wasn't deprecated, i would not use it. [grin] $Results = foreach ($Thing in $Collection) {Do-Stuff}
and the output of the scriptblock will be held in RAM until the loop completes. then it will be stuffed into the $Results
collection all at once. the 2nd is the fastest.
if you have no need to change the size of the collection after you build it, then use the 2nd method. otherwise use the 1st.
as an example of the speed, your code [with 15,000 items] runs in 39 seconds on my system. using the "send to output" technique takes 24 seconds.
remember that the slow down will continue to get worse as the array gets larger. i was unwilling to wait on 150k iterations.
here's my demo code ...
$ArrayDataRows = 15e3
$PlaceList = 'NJ, UT, NY, MI, PA, FL, AL, NM, CA, OK, TX, CO, AZ'.Split(',').Trim()
$ColorList = 'red, yellow, blue, purple, green, white, black'.Split(',').Trim()
$UC_LetterList = [char[]](65..90)
$GroupList = 1..20
(Measure-Command -Expression {
$ArrayData = foreach ($i in 1..$ArrayDataRows) {
[PSCustomObject] @{
Number = $i
Place = Get-Random -InputObject $PlaceList
Color = Get-Random -InputObject $ColorList
Zone = -join (Get-Random -InputObject $UC_LetterList -Count 10)
Group = Get-Random -InputObject $GroupList
}
}
}).TotalMilliseconds
# total ms = 24,390