Search code examples
powershellexchange-serverexchange-online

How to speed up PowerShell script to pull Get-EXOMailbox and Get-EXOMailboxStatistics info with formating on 18k mailboxes


I'm a beginner PowerShell user and I've written a script to pull specific information about each user's mailbox and archive in my organization, about 18,000 users. Currently it's taking around 24 hours to run. I know it's not efficiently coded and I'm hoping someone can help me find a better way to code it to bring the run time down? Please let me know what techniques could be used to help speed this up.

Param (
    [Parameter(Mandatory=$false)] 
    [string]$OutputPath = 'C:\Temp\Users-ArchiveSize.csv'
)

Connect-ExchangeOnline -UserPrincipalName [email protected]  -ConnectionUri "https://outlook.office365.com/powershell?email=@yourmailaddress"

$result = @()

$Users = Get-Content -Path "C:\Temp\users-Testing.csv"

Measure-Command { foreach($user in $users) {

    $u = Get-EXOMailbox -Identity $user

    If ($u.AutoExpandingArchiveEnabled -eq $true) {
        $autoArchiveEnabled = "True"
    }
    Else {
        $autoArchiveEnabled = "False"
    }

    $displayName = $u.DisplayName
    $primarySmtpAddress = $u.PrimarySmtpAddress

    $data = Get-EXOMailboxStatistics $u.UserPrincipalName -Archive | select TotalItemSize,ItemCount
    $totalSize = $data.TotalItemSize
    $itemCount = $data.ItemCount

    $result += New-Object psobject -Property @{
        'DisplayName' = $displayName;
        'PrimarySMTP' = $primarySmtpAddress;
        'TotalSize' = $totalSize;
        'ItemCount' = $itemCount;
        'AutoExpandEnabled' = $autoArchiveEnabled;
    }
}
}

if ($OutputPath) {
    $result | Export-Csv -Path $OutputPath -NoTypeInformation
}

So far I've tried to isolate the queries against the Exchange database into an array separately, hoping it would be faster than pulling and formatting at the same time, then pulled what I needed out of that array to format and export. I thought it would help but it actually took longer.

I was exploring the start-job with a script block but I wasn't able to get it to work properly. I don't think I understand the concept enough, the csv export was empty. I was able to see that it was correctly performing "$u = Get-EXOMailbox -Identity $user" but creating the variable "$displayName = $u.DisplayName" didn't work, that variable as well as the other variables were all empty.

I was researching and thought running the queries in parallel might be another good option but I'm still trying to see if I can figure out how to do that or even if it's possible.

I'm still just a beginner in PS and the methods/concepts are very hard to wrap my head around and just dive into, I don't have a lot of programming experience. Any help, even just a direction to go in would really help me out!

Testing:

I've tried the following code:

Param (
   [Parameter(Mandatory = $false)]
   [string]$OutputPath = 'C:\Temp\Users-ArchiveSize.csv'
)

#  Open a connection
Connect-ExchangeOnline -UserPrincipalName [email protected]  -ConnectionUri "https://outlook.office365.com/powershell?email=@yourmailaddress"
#  Get a full list of users and then allocate the same number of cells in $Results
$Users = Get-Content -Path "C:\TEMP\EE\ca-users-Testing-10.csv" | Where-Object {$_ -match "@domain.ca"}
$Results = [PSObject[]]::new($Users.Count)


      
         $Users | & {   #  This stage packages each user with an index indicating their storage location in $Results.
            begin {
               $Index = 0
            }
            process {
               [PSCustomObject]@{
                  Index  = $Index
                  User   = $Users[$Index]
                  Result = $null
               }
               $Index++
            }
         } | Foreach-Object -ThrottleLimit 5 -Parallel { #  This stage builds and saves each PSObject per each user, but doing multiple in parallel
            #  Collect need information on the user
            $u = Get-EXOMailbox -Identity $_.User | Select-Object UserPrincipalName,DisplayName,PrimarySMTPAddress
            $data = Get-EXOMailboxStatistics $u.UserPrincipalName -Archive | Select-Object TotalItemSize
            $archiveStatus = Get-EXOMailbox -Identity $u.UserPrincipalName -PropertySets Archive | Select-Object AutoExpandingArchiveEnabled
            #  Build and save the PSObject containing the needed information on the user
            $_.Result = [PSCustomObject]@{
               'DisplayName'       = $u.DisplayName;
               'PrimarySMTP'       = $u.PrimarySmtpAddress;
               'TotalSize'         = $data.TotalItemSize;
               'AutoExpandEnabled' = $archiveStatus.AutoExpandingArchiveEnabled
            }
            #  Make the PSCustomObject available to the next stage
            $_
         } | & {  #  Save each PSObject in its proper location in $Results
            process {
               $Results[$_.Index] = $_.Result
            }
         }
      


#  Write $Results to file in CSV format
if ($OutputPath) {
   $Results | Export-Csv -Path $OutputPath -NoTypeInformation
}

I received the following errors:

ForEach-Object : Parameter set cannot be resolved using the specified named parameters.
At C:\Users\ce-eveinarsson\Documents\Get_User_Archive_Size_Parallel_v6.ps1:29 char:14
+ ...         } | Foreach-Object -ThrottleLimit 5 -Parallel { #  This stage ...
+                 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : MetadataError: (:) [ForEach-Object], ParameterBindingException
    + FullyQualifiedErrorId : AmbiguousParameterSet,Microsoft.PowerShell.Commands.ForEachObjectCommand


Export-Csv : Cannot bind argument to parameter 'InputObject' because it is null.
At C:\Users\ce-eveinarsson\Documents\Get_User_Archive_Size_Parallel_v6.ps1:53 char:15
+    $Results | Export-Csv -Path $OutputPath -NoTypeInformation
+               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidData: (:) [Export-Csv], ParameterBindingValidationException
    + FullyQualifiedErrorId : ParameterArgumentValidationErrorNullNotAllowed,Microsoft.PowerShell.Commands.ExportCsvCommand

From what I can determine, it pulls the information and I can pull values from $u but as soon as the script gets to $_ it doesn't seem to do anything. The values $.Index and $.Result are null, so when it tries to export-csv $Results, it's a null value and produces the above error?


Solution

  • I've converted your code 2 times now, both resulting in the code below. I've also tested parts of the code, such as adding items to the $results array from within the for loop, everything looks good. But I don't have your exact situation, so I can't test the full code for errors or problems. So, I guess you need to give it a try.

    If this throws an error, let me know what you experience and maybe we can figure it out.

    Param (
       [Parameter(Mandatory = $false)]
       [string]$OutputPath = 'C:\Temp\Users-ArchiveSize.csv'
    )
    
    Connect-ExchangeOnline -UserPrincipalName [email protected]  -ConnectionUri "https://outlook.office365.com/powershell?email=@yourmailaddress"
    
    $Users = Get-Content -Path "C:\Temp\users-Testing.csv"
    $result = [PSObject[]]::new($Users.Count)
    Measure-Command -Expression {
       & {
          for ($i = 0; $i -lt $result.Count; $i++) {
             $u = Get-EXOMailbox -Identity $Users[$i]
             $data = Get-EXOMailboxStatistics $u.UserPrincipalName -Archive | Select-Object TotalItemSize, ItemCount
             $result[$i] = [PSObject]@{
                DisplayName       = $u.DisplayName
                PrimarySMTP       = $u.PrimarySmtpAddress
                TotalSize         = $data.TotalItemSize
                ItemCount         = $data.ItemCount
                AutoExpandEnabled = $u.AutoExpandingArchiveEnabled -eq $true
             }
          }
       }
    }
    
    if ($OutputPath) {
       $result | Export-Csv -Path $OutputPath -NoTypeInformation
    }
    

    Some key points:

    1. You already have an array the size you will be needing - $Users, so the code builds an array of type [PSObject] that is the same size of the array of users.
      • This removes the need for using $result += which is constantly building new arrays and copying the content of the old.
      • It replaces it with $result[$i] = which saves the new PSObjects directly into the already existing cells in the array.
    2. I had help from Santiago when working on this answer, and he warned me about the Measure-Command using Dot Sourcing . to execute the script block
      • This, to my surprise, will reduce the performance of some PowerShell commands and operators.
      • His warning was directly related to the Call operator &.
      • But I accidently discovered it also badly degrades the performance of foreach loops.
        • My version of your code does NOT use the foreach, but it does use a for loop which may also be subject to the same problem.
      • Santiago's solution was to use the Call operator from within the script block called by Measure-Command to call another script block containing the code you want to measure.
        • Measure-Command
          • Script block {<code>}
            • Call operator &
              • Another script block {<code>}
                • Code you want to measure
    3. Your code checks if $u.AutoExpandingArchiveEnabled -eq $true and in a very indirect way assigned $true if $true and $false if $false to AutoExpandEnabled, I've shortened that considerably, and I question if it needs the -eq $true part.

    Possible Changes to Get-EXOMailboxStatistics:

    It would be interesting to see what happens if this:

    $data = Get-EXOMailboxStatistics $u.UserPrincipalName -Archive | Select-Object TotalItemSize, ItemCount
    

    Was changed to the following. This would help Get-EXOMailboxStatistics know what is being handed to it.

    $data = Get-EXOMailboxStatistics -UserPrincipalName $u.UserPrincipalName -Archive | Select-Object TotalItemSize, ItemCount
    

    Or, what if it was changed to the following. I'm not a pipeline expert, but I believe Select-Object is repackaging 2 properties in a new object that has only those 2 properties. This seems like a waste of processing time and I don't see the value - but maybe I don't understand the need for it.

    $data = Get-EXOMailboxStatistics $u.UserPrincipalName -Archive
    

    Parallel Processing:

    The following is a possible design for using PowerShell 7's Foreach-Object -Parallel:

    $OutputPath = "$PSScriptRoot\Results.csv"
    $Users = 'User1', 'User2', 'User3', 'User4', 'User5', 'User6', 'User7', 'User8', 'User9'
    $Results = [PSObject[]]::new($Users.Count)
    Measure-Command -Expression {
       & {
          $Users | & {
             begin {
                $Index = 0
             }
             process {
                [PSCustomObject]@{
                   Index  = $Index
                   User   = $Users[$Index]
                   Result = $null
                }
                $Index++
             }
          } | Foreach-Object -ThrottleLimit 5 -Parallel {
             $i = $_.Index
             $User = $_.User
             $_.Result = [PSObject]@{
                User = $User
                Text = "$i=$User"
             }
             $_
          } | & {
             process {
                $Results[$_.Index] = $_.Result
             }
          }
       }
    }
    $Results | Export-Csv -Path $OutputPath -NoTypeInformation
    

    In PowerShell 7.3.9 it produced a file containing the following:

    "User","Text"
    "User1","0=User1"
    "User2","1=User2"
    "User3","2=User3"
    "User4","3=User4"
    "User5","4=User5"
    "User6","5=User6"
    "User7","6=User7"
    "User8","7=User8"
    "User9","8=User9"
    

    Possible Missing Disconnect

    Looking at Connect to Exchange Online PowerShell, I noticed it has a section Step 3: Disconnect when you're finished.

    Should the script end with this line?:

    Disconnect-ExchangeOnline -Confirm:$false
    

    Improving Get-EXOMailbox

    The docs for Get-EXOMailbox, in the -Identity section, says:

    For the best performance, we recommend using the following values:

    . User ID or user principal name (UPN)
    . GUID

    The question is this, what is the content of "C:\Temp\users-Testing.csv"? Is it UPNs? GUIDs? Other?

    If it turns out that it is Other, we could save either the UPN or GUID for each Other in a hash table, save the hash table to a file, load the hash table when the script starts, and do the conversion from Other to either UPN or GUID prior to calling Get-EXOMailbox, and passing Other to Get-EXOMailbox only when it is missing from the hash table. I'm skeptical of how much of a performance increase this would give, but if we get desperate, it is an option to consider.

    Conclusion:

    Contents of the conclusion will likely change as I receive feedback from esquared.

    Barebones Parallel Version: This version is probably fairly close to as good as you will get for performance without rewriting it in C#. The -ThrottleLimit still needs to be adjusted to find that fine point between performance and overloading the server with request. All extras related to processing users is removed, and Measure-Command is replaced with a function.

    NOTE: Even though I call this "Barebones", there are 2 parts that probably aren't needed. The first is | Select-Object TotalItemSize, ItemCount, which the docs say "When you select properties, Select-Object returns new objects that have only the specified properties." And the second is -eq $true at the end of AutoExpandEnabled = $u.AutoExpandingArchiveEnabled. This second can be proven removable by executing ($u.AutoExpandingArchiveEnabled).GetType() after $u is created. If that command returns "Boolean", then it isn't needed.

    The first half of the script defines the $OutputPath parameter, defines function Get-ReadableTimeSince, and connects to server.

    Param (
       [Parameter(Mandatory = $false)]
       [string]$OutputPath = 'C:\Temp\Users-ArchiveSize.csv'
    )
    function Get-ReadableTimeSince {
       [CmdletBinding()]
       param (
          [Parameter(Mandatory = $true, Position = 0)]
          [DateTime]$DateTime
       )
       $TimeSpan = [DateTime]::now - $DateTime
       [string[]]$TimeList =
       @( $TimeSpan.Days, 'day' ),
       @( $TimeSpan.Hours, 'hour' ),
       @( $TimeSpan.Minutes, 'minute' ),
       @( [float]($TimeSpan.Seconds + $TimeSpan.Milliseconds / 1000), 'second' ) |
       & { process {
             if ($_[0] -ne 1 -and $_[0] -ne 1.0) { $_[1] += 's' }
             if (($_[0])) { '{0} {1}' -f $_[0], $_[1] }
          } }
       if ($TimeList.Count -eq 0) { $TimeList += '0 seconds' }
       $LastIndex = $TimeList.Count - 1
       if ($LastIndex -ge 2) { $TimeList[$LastIndex] = 'and ' + $TimeList[$LastIndex]; $TimeList -join ', ' }
       else { $TimeList -join ' and ' }
    }
    Connect-ExchangeOnline -UserPrincipalName [email protected]  -ConnectionUri "https://outlook.office365.com/powershell?email=@yourmailaddress"
    

    This second part of the script capture the time, does the work and calls Get-ReadableTimeSince with the time afterwards. The time will show in only those days, hours, minutes, and seconds that have a value. So it might say something like "5 hours and 23.173 seconds" if its been 0 minutes since the top of the hour.

    $ProcessingStarted = [DateTime]::now
    if ($OutputPath) {
       Get-Content -Path "C:\Temp\users-Testing.csv" | Foreach-Object -ThrottleLimit 5 -Parallel {
          $u = Get-EXOMailbox -Identity $_
          $data = Get-EXOMailboxStatistics $u.UserPrincipalName -Archive | Select-Object TotalItemSize, ItemCount
          [PSObject]@{
             DisplayName       = $u.DisplayName
             PrimarySMTP       = $u.PrimarySmtpAddress
             TotalSize         = $data.TotalItemSize
             AutoExpandEnabled = $u.AutoExpandingArchiveEnabled -eq $true
          }
       } | Export-Csv -Path $OutputPath -NoTypeInformation
    }
    Get-ReadableTimeSince $ProcessingStarted
    

    Everything Parallel Version: The following code will, in PowerShell 7, theoretically do 5 parallel request for information on users from the server. It is theoretical due to the fact that I currently cannot recreate esquared's working environment, which prevents me properly testing it.

    The code is limited to 5 parallel request, but this can be changed by changing the -ThrottleLimit argument of the Foreach-Object cmdlet to the desired number. NOTE: The server's load should be monitored to verify that it isn't being overloaded with request.

    This first section is the setup of everything we need:

    #  Either receive or define path where output should be saved.
    Param (
       [Parameter(Mandatory = $false)]
       [string]$OutputPath = 'C:\Temp\Users-ArchiveSize.csv'
    )
    #Function for converting a [TimeSpan] into human readable time span
    function Get-ReadableTimeSpan {
       [CmdletBinding()]
       param (
          [Parameter(Mandatory = $true, Position = 0)]
          [TimeSpan]$TimeSpan
       )
       [string[]]$TimeList =
       @( $TimeSpan.Days, 'day' ),
       @( $TimeSpan.Hours, 'hour' ),
       @( $TimeSpan.Minutes, 'minute' ),
       @( [float]($TimeSpan.Seconds + $TimeSpan.Milliseconds / 1000), 'second' ) |
       .{ process {
             if ($_[0] -ne 1 -and $_[0] -ne 1.0) { $_[1] += 's' }
             if (($_[0])) { '{0} {1}' -f $_[0], $_[1] }
          } }
       if ($TimeList.Count -eq 0) { $TimeList += '0 seconds' }
       $LastIndex = $TimeList.Count - 1
       if ($LastIndex -ge 2) { $TimeList[$LastIndex] = 'and ' + $TimeList[$LastIndex]; $TimeList -join ', ' }
       else { $TimeList -join ' and ' }
    }
    #  Open a connection
    Connect-ExchangeOnline -UserPrincipalName [email protected]  -ConnectionUri "https://outlook.office365.com/powershell?email=@yourmailaddress"
    #  Get a full list of users and then allocate the same number of cells in $Results
    $Users = Get-Content -Path "C:\Temp\users-Testing.csv"
    $Results = [PSObject[]]::new($Users.Count)
    

    This section does the actual work:

    #  Capture the amount of time this code runs
    $MeasuredTimeSpan = (
       Measure-Command -Expression {
          #  Isolate code from the negative performance effect caused by Measure-Command's Dot Sourcing call of script block
          & {
             $Users | & {   #  This stage packages each user with an index indicating their storage location in $Results.
                begin {
                   $Index = 0
                }
                process {
                   [PSCustomObject]@{
                      Index  = $Index
                      User   = $Users[$Index]
                      Result = $null
                   }
                   $Index++
                }
             } | Foreach-Object -ThrottleLimit 5 -Parallel { #  This stage builds and saves each PSObject per each user, but doing multiple in parallel
                #  Collect need information on the user
                $u = Get-EXOMailbox -Identity $_.User
                $data = Get-EXOMailboxStatistics $u.UserPrincipalName -Archive | Select-Object TotalItemSize, ItemCount
                #  Build and save the PSObject containing the needed information on the user
                $_.Result = [PSObject]@{
                   DisplayName       = $u.DisplayName
                   PrimarySMTP       = $u.PrimarySmtpAddress
                   TotalSize         = $data.TotalItemSize
                   ItemCount         = $data.ItemCount
                   AutoExpandEnabled = $u.AutoExpandingArchiveEnabled -eq $true
                }
                #  Make the PSCustomObject available to the next stage
                $_
             } | & {  #  Save each PSObject in its proper location in $Results
                process {
                   $Results[$_.Index] = $_.Result
                }
             }
          }
       }
    )
    

    And this section outputs the results:

    #  Write $Results to file in CSV format
    if ($OutputPath) {
       $Results | Export-Csv -Path $OutputPath -NoTypeInformation
    }
    #  Convert the captured [TimeSpan] to human readable time span
    Get-ReadableTimeSpan $MeasuredTimeSpan