I'm a beginner PowerShell user and I've written a script to pull specific information about each user's mailbox and archive in my organization, about 18,000 users. Currently it's taking around 24 hours to run. I know it's not efficiently coded and I'm hoping someone can help me find a better way to code it to bring the run time down? Please let me know what techniques could be used to help speed this up.
Param (
[Parameter(Mandatory=$false)]
[string]$OutputPath = 'C:\Temp\Users-ArchiveSize.csv'
)
Connect-ExchangeOnline -UserPrincipalName [email protected] -ConnectionUri "https://outlook.office365.com/powershell?email=@yourmailaddress"
$result = @()
$Users = Get-Content -Path "C:\Temp\users-Testing.csv"
Measure-Command { foreach($user in $users) {
$u = Get-EXOMailbox -Identity $user
If ($u.AutoExpandingArchiveEnabled -eq $true) {
$autoArchiveEnabled = "True"
}
Else {
$autoArchiveEnabled = "False"
}
$displayName = $u.DisplayName
$primarySmtpAddress = $u.PrimarySmtpAddress
$data = Get-EXOMailboxStatistics $u.UserPrincipalName -Archive | select TotalItemSize,ItemCount
$totalSize = $data.TotalItemSize
$itemCount = $data.ItemCount
$result += New-Object psobject -Property @{
'DisplayName' = $displayName;
'PrimarySMTP' = $primarySmtpAddress;
'TotalSize' = $totalSize;
'ItemCount' = $itemCount;
'AutoExpandEnabled' = $autoArchiveEnabled;
}
}
}
if ($OutputPath) {
$result | Export-Csv -Path $OutputPath -NoTypeInformation
}
So far I've tried to isolate the queries against the Exchange database into an array separately, hoping it would be faster than pulling and formatting at the same time, then pulled what I needed out of that array to format and export. I thought it would help but it actually took longer.
I was exploring the start-job with a script block but I wasn't able to get it to work properly. I don't think I understand the concept enough, the csv export was empty. I was able to see that it was correctly performing "$u = Get-EXOMailbox -Identity $user" but creating the variable "$displayName = $u.DisplayName" didn't work, that variable as well as the other variables were all empty.
I was researching and thought running the queries in parallel might be another good option but I'm still trying to see if I can figure out how to do that or even if it's possible.
I'm still just a beginner in PS and the methods/concepts are very hard to wrap my head around and just dive into, I don't have a lot of programming experience. Any help, even just a direction to go in would really help me out!
Testing:
I've tried the following code:
Param (
[Parameter(Mandatory = $false)]
[string]$OutputPath = 'C:\Temp\Users-ArchiveSize.csv'
)
# Open a connection
Connect-ExchangeOnline -UserPrincipalName [email protected] -ConnectionUri "https://outlook.office365.com/powershell?email=@yourmailaddress"
# Get a full list of users and then allocate the same number of cells in $Results
$Users = Get-Content -Path "C:\TEMP\EE\ca-users-Testing-10.csv" | Where-Object {$_ -match "@domain.ca"}
$Results = [PSObject[]]::new($Users.Count)
$Users | & { # This stage packages each user with an index indicating their storage location in $Results.
begin {
$Index = 0
}
process {
[PSCustomObject]@{
Index = $Index
User = $Users[$Index]
Result = $null
}
$Index++
}
} | Foreach-Object -ThrottleLimit 5 -Parallel { # This stage builds and saves each PSObject per each user, but doing multiple in parallel
# Collect need information on the user
$u = Get-EXOMailbox -Identity $_.User | Select-Object UserPrincipalName,DisplayName,PrimarySMTPAddress
$data = Get-EXOMailboxStatistics $u.UserPrincipalName -Archive | Select-Object TotalItemSize
$archiveStatus = Get-EXOMailbox -Identity $u.UserPrincipalName -PropertySets Archive | Select-Object AutoExpandingArchiveEnabled
# Build and save the PSObject containing the needed information on the user
$_.Result = [PSCustomObject]@{
'DisplayName' = $u.DisplayName;
'PrimarySMTP' = $u.PrimarySmtpAddress;
'TotalSize' = $data.TotalItemSize;
'AutoExpandEnabled' = $archiveStatus.AutoExpandingArchiveEnabled
}
# Make the PSCustomObject available to the next stage
$_
} | & { # Save each PSObject in its proper location in $Results
process {
$Results[$_.Index] = $_.Result
}
}
# Write $Results to file in CSV format
if ($OutputPath) {
$Results | Export-Csv -Path $OutputPath -NoTypeInformation
}
I received the following errors:
ForEach-Object : Parameter set cannot be resolved using the specified named parameters.
At C:\Users\ce-eveinarsson\Documents\Get_User_Archive_Size_Parallel_v6.ps1:29 char:14
+ ... } | Foreach-Object -ThrottleLimit 5 -Parallel { # This stage ...
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : MetadataError: (:) [ForEach-Object], ParameterBindingException
+ FullyQualifiedErrorId : AmbiguousParameterSet,Microsoft.PowerShell.Commands.ForEachObjectCommand
Export-Csv : Cannot bind argument to parameter 'InputObject' because it is null.
At C:\Users\ce-eveinarsson\Documents\Get_User_Archive_Size_Parallel_v6.ps1:53 char:15
+ $Results | Export-Csv -Path $OutputPath -NoTypeInformation
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : InvalidData: (:) [Export-Csv], ParameterBindingValidationException
+ FullyQualifiedErrorId : ParameterArgumentValidationErrorNullNotAllowed,Microsoft.PowerShell.Commands.ExportCsvCommand
From what I can determine, it pulls the information and I can pull values from $u but as soon as the script gets to $_ it doesn't seem to do anything. The values $.Index and $.Result are null, so when it tries to export-csv $Results, it's a null value and produces the above error?
I've converted your code 2 times now, both resulting in the code below. I've also tested parts of the code, such as adding items to the $results
array from within the for
loop, everything looks good. But I don't have your exact situation, so I can't test the full code for errors or problems. So, I guess you need to give it a try.
If this throws an error, let me know what you experience and maybe we can figure it out.
Param (
[Parameter(Mandatory = $false)]
[string]$OutputPath = 'C:\Temp\Users-ArchiveSize.csv'
)
Connect-ExchangeOnline -UserPrincipalName [email protected] -ConnectionUri "https://outlook.office365.com/powershell?email=@yourmailaddress"
$Users = Get-Content -Path "C:\Temp\users-Testing.csv"
$result = [PSObject[]]::new($Users.Count)
Measure-Command -Expression {
& {
for ($i = 0; $i -lt $result.Count; $i++) {
$u = Get-EXOMailbox -Identity $Users[$i]
$data = Get-EXOMailboxStatistics $u.UserPrincipalName -Archive | Select-Object TotalItemSize, ItemCount
$result[$i] = [PSObject]@{
DisplayName = $u.DisplayName
PrimarySMTP = $u.PrimarySmtpAddress
TotalSize = $data.TotalItemSize
ItemCount = $data.ItemCount
AutoExpandEnabled = $u.AutoExpandingArchiveEnabled -eq $true
}
}
}
}
if ($OutputPath) {
$result | Export-Csv -Path $OutputPath -NoTypeInformation
}
Some key points:
$Users
, so the code builds an array of type [PSObject]
that is the same size of the array of users.
$result +=
which is constantly building new arrays and copying the content of the old.$result[$i] =
which saves the new PSObjects directly into the already existing cells in the array.Measure-Command
using Dot Sourcing .
to execute the script block
&
.foreach
loops.
foreach
, but it does use a for
loop which may also be subject to the same problem.Measure-Command
to call another script block containing the code you want to measure.
Measure-Command
{
<code>}
&
{
<code>}
$u.AutoExpandingArchiveEnabled -eq $true
and in a very indirect way assigned $true
if $true
and $false
if $false
to AutoExpandEnabled, I've shortened that considerably, and I question if it needs the -eq $true
part.Get-EXOMailboxStatistics
:It would be interesting to see what happens if this:
$data = Get-EXOMailboxStatistics $u.UserPrincipalName -Archive | Select-Object TotalItemSize, ItemCount
Was changed to the following. This would help Get-EXOMailboxStatistics
know what is being handed to it.
$data = Get-EXOMailboxStatistics -UserPrincipalName $u.UserPrincipalName -Archive | Select-Object TotalItemSize, ItemCount
Or, what if it was changed to the following. I'm not a pipeline expert, but I believe Select-Object is repackaging 2 properties in a new object that has only those 2 properties. This seems like a waste of processing time and I don't see the value - but maybe I don't understand the need for it.
$data = Get-EXOMailboxStatistics $u.UserPrincipalName -Archive
The following is a possible design for using PowerShell 7's Foreach-Object -Parallel
:
$OutputPath = "$PSScriptRoot\Results.csv"
$Users = 'User1', 'User2', 'User3', 'User4', 'User5', 'User6', 'User7', 'User8', 'User9'
$Results = [PSObject[]]::new($Users.Count)
Measure-Command -Expression {
& {
$Users | & {
begin {
$Index = 0
}
process {
[PSCustomObject]@{
Index = $Index
User = $Users[$Index]
Result = $null
}
$Index++
}
} | Foreach-Object -ThrottleLimit 5 -Parallel {
$i = $_.Index
$User = $_.User
$_.Result = [PSObject]@{
User = $User
Text = "$i=$User"
}
$_
} | & {
process {
$Results[$_.Index] = $_.Result
}
}
}
}
$Results | Export-Csv -Path $OutputPath -NoTypeInformation
In PowerShell 7.3.9 it produced a file containing the following:
"User","Text"
"User1","0=User1"
"User2","1=User2"
"User3","2=User3"
"User4","3=User4"
"User5","4=User5"
"User6","5=User6"
"User7","6=User7"
"User8","7=User8"
"User9","8=User9"
Looking at Connect to Exchange Online PowerShell, I noticed it has a section Step 3: Disconnect when you're finished.
Should the script end with this line?:
Disconnect-ExchangeOnline -Confirm:$false
The docs for Get-EXOMailbox, in the -Identity section, says:
For the best performance, we recommend using the following values:
. User ID or user principal name (UPN)
. GUID
The question is this, what is the content of "C:\Temp\users-Testing.csv"? Is it UPNs? GUIDs? Other?
If it turns out that it is Other, we could save either the UPN or GUID for each Other in a hash table, save the hash table to a file, load the hash table when the script starts, and do the conversion from Other to either UPN or GUID prior to calling Get-EXOMailbox
, and passing Other to Get-EXOMailbox
only when it is missing from the hash table. I'm skeptical of how much of a performance increase this would give, but if we get desperate, it is an option to consider.
Contents of the conclusion will likely change as I receive feedback from esquared.
Barebones Parallel Version:
This version is probably fairly close to as good as you will get for performance without rewriting it in C#. The -ThrottleLimit
still needs to be adjusted to find that fine point between performance and overloading the server with request. All extras related to processing users is removed, and Measure-Command
is replaced with a function.
NOTE: Even though I call this "Barebones", there are 2 parts that probably aren't needed. The first is | Select-Object TotalItemSize, ItemCount
, which the docs say "When you select properties, Select-Object returns new objects that have only the specified properties." And the second is -eq $true
at the end of AutoExpandEnabled = $u.AutoExpandingArchiveEnabled
. This second can be proven removable by executing ($u.AutoExpandingArchiveEnabled).GetType()
after $u
is created. If that command returns "Boolean", then it isn't needed.
The first half of the script defines the $OutputPath parameter, defines function Get-ReadableTimeSince, and connects to server.
Param (
[Parameter(Mandatory = $false)]
[string]$OutputPath = 'C:\Temp\Users-ArchiveSize.csv'
)
function Get-ReadableTimeSince {
[CmdletBinding()]
param (
[Parameter(Mandatory = $true, Position = 0)]
[DateTime]$DateTime
)
$TimeSpan = [DateTime]::now - $DateTime
[string[]]$TimeList =
@( $TimeSpan.Days, 'day' ),
@( $TimeSpan.Hours, 'hour' ),
@( $TimeSpan.Minutes, 'minute' ),
@( [float]($TimeSpan.Seconds + $TimeSpan.Milliseconds / 1000), 'second' ) |
& { process {
if ($_[0] -ne 1 -and $_[0] -ne 1.0) { $_[1] += 's' }
if (($_[0])) { '{0} {1}' -f $_[0], $_[1] }
} }
if ($TimeList.Count -eq 0) { $TimeList += '0 seconds' }
$LastIndex = $TimeList.Count - 1
if ($LastIndex -ge 2) { $TimeList[$LastIndex] = 'and ' + $TimeList[$LastIndex]; $TimeList -join ', ' }
else { $TimeList -join ' and ' }
}
Connect-ExchangeOnline -UserPrincipalName [email protected] -ConnectionUri "https://outlook.office365.com/powershell?email=@yourmailaddress"
This second part of the script capture the time, does the work and calls Get-ReadableTimeSince with the time afterwards. The time will show in only those days, hours, minutes, and seconds that have a value. So it might say something like "5 hours and 23.173 seconds" if its been 0 minutes since the top of the hour.
$ProcessingStarted = [DateTime]::now
if ($OutputPath) {
Get-Content -Path "C:\Temp\users-Testing.csv" | Foreach-Object -ThrottleLimit 5 -Parallel {
$u = Get-EXOMailbox -Identity $_
$data = Get-EXOMailboxStatistics $u.UserPrincipalName -Archive | Select-Object TotalItemSize, ItemCount
[PSObject]@{
DisplayName = $u.DisplayName
PrimarySMTP = $u.PrimarySmtpAddress
TotalSize = $data.TotalItemSize
AutoExpandEnabled = $u.AutoExpandingArchiveEnabled -eq $true
}
} | Export-Csv -Path $OutputPath -NoTypeInformation
}
Get-ReadableTimeSince $ProcessingStarted
Everything Parallel Version: The following code will, in PowerShell 7, theoretically do 5 parallel request for information on users from the server. It is theoretical due to the fact that I currently cannot recreate esquared's working environment, which prevents me properly testing it.
The code is limited to 5 parallel request, but this can be changed by changing the -ThrottleLimit
argument of the Foreach-Object
cmdlet to the desired number. NOTE: The server's load should be monitored to verify that it isn't being overloaded with request.
This first section is the setup of everything we need:
# Either receive or define path where output should be saved.
Param (
[Parameter(Mandatory = $false)]
[string]$OutputPath = 'C:\Temp\Users-ArchiveSize.csv'
)
#Function for converting a [TimeSpan] into human readable time span
function Get-ReadableTimeSpan {
[CmdletBinding()]
param (
[Parameter(Mandatory = $true, Position = 0)]
[TimeSpan]$TimeSpan
)
[string[]]$TimeList =
@( $TimeSpan.Days, 'day' ),
@( $TimeSpan.Hours, 'hour' ),
@( $TimeSpan.Minutes, 'minute' ),
@( [float]($TimeSpan.Seconds + $TimeSpan.Milliseconds / 1000), 'second' ) |
.{ process {
if ($_[0] -ne 1 -and $_[0] -ne 1.0) { $_[1] += 's' }
if (($_[0])) { '{0} {1}' -f $_[0], $_[1] }
} }
if ($TimeList.Count -eq 0) { $TimeList += '0 seconds' }
$LastIndex = $TimeList.Count - 1
if ($LastIndex -ge 2) { $TimeList[$LastIndex] = 'and ' + $TimeList[$LastIndex]; $TimeList -join ', ' }
else { $TimeList -join ' and ' }
}
# Open a connection
Connect-ExchangeOnline -UserPrincipalName [email protected] -ConnectionUri "https://outlook.office365.com/powershell?email=@yourmailaddress"
# Get a full list of users and then allocate the same number of cells in $Results
$Users = Get-Content -Path "C:\Temp\users-Testing.csv"
$Results = [PSObject[]]::new($Users.Count)
This section does the actual work:
# Capture the amount of time this code runs
$MeasuredTimeSpan = (
Measure-Command -Expression {
# Isolate code from the negative performance effect caused by Measure-Command's Dot Sourcing call of script block
& {
$Users | & { # This stage packages each user with an index indicating their storage location in $Results.
begin {
$Index = 0
}
process {
[PSCustomObject]@{
Index = $Index
User = $Users[$Index]
Result = $null
}
$Index++
}
} | Foreach-Object -ThrottleLimit 5 -Parallel { # This stage builds and saves each PSObject per each user, but doing multiple in parallel
# Collect need information on the user
$u = Get-EXOMailbox -Identity $_.User
$data = Get-EXOMailboxStatistics $u.UserPrincipalName -Archive | Select-Object TotalItemSize, ItemCount
# Build and save the PSObject containing the needed information on the user
$_.Result = [PSObject]@{
DisplayName = $u.DisplayName
PrimarySMTP = $u.PrimarySmtpAddress
TotalSize = $data.TotalItemSize
ItemCount = $data.ItemCount
AutoExpandEnabled = $u.AutoExpandingArchiveEnabled -eq $true
}
# Make the PSCustomObject available to the next stage
$_
} | & { # Save each PSObject in its proper location in $Results
process {
$Results[$_.Index] = $_.Result
}
}
}
}
)
And this section outputs the results:
# Write $Results to file in CSV format
if ($OutputPath) {
$Results | Export-Csv -Path $OutputPath -NoTypeInformation
}
# Convert the captured [TimeSpan] to human readable time span
Get-ReadableTimeSpan $MeasuredTimeSpan