Search code examples
powershellteamcitysmb

Close-SMBOpenFile throws error and isn't caught in try-catch


We are using a TeamCity powershell in script execution mode as part of a pipeline with snapshot and artifact dependencies. We have a fairly robust system and have been using this particular process for a couple years, so this isn't brand new code that I'm debugging for the first time. Sadly. It normally works until it randomly doesn't. The TeamCity Agent is different when the error does occur.

This part of our process does some code deploy and some log backups. In order to completely do the backup, we have to ensure the files aren't kept open by QA or Devs at their desk looking at the logs and maybe have them open in read-write mode or the like. Because they would be opening them from their laptop/desktops, they are naturally SMB shares. So we have this function below that is supposed to close the files open on the given server. I say supposed to, because every once in a while it throws this error and I can't seem to either catch it (locally even) or suppress it, so it breaks the TeamCity run. (I've anonymized with ...SNIP anywhere the code is proprietary names or proprietary output)

You can actually test this on your machine by just navigating to \\yourhostname\c$\somefilepath\somefile and see that it will show that the files are open. It shouldn't fail on your machine once you've read through the code and see what it's doing, but if you take out all of the "precautions" you can potentially reproduce the error locally.

function Close-SMBApplicationLocks {
<#
.SYNOPSIS
    Closes Active SMB Sessions for Default or User Supplied Paths

.DESCRIPTION
    This function is used to prevent interruption to deployments by closing any SMB locks
    in application paths.  Defaults to closing sessions in folders matching regex
    ...SNIP

.PARAMETER Paths
    [string[]] A string array of paths or path segments to match sessions against.

.EXAMPLE
    Close-SMBApplicationLocks

...SNIP

.EXAMPLE
    Close-SMBApplicationLocks -Paths @("TEMP")

...SNIP
#>
    [CmdletBinding()]
    param(
        [Alias("SharePaths")]
        [Parameter(Mandatory=$false)]
        [string[]]$Paths
    )

    $pathsToUse = Test-IsNull ($Paths -join "|") "...SNIP"
    Write-Verbose ("Looking for SMB Sessions Matching Path: {0}" -f $pathsToUse)

    $smbSessions = @(Get-SmbOpenFile | Where-Object {$_.Path -match $pathsToUse})

    if ((Test-IsCollectionNullOrEmpty $smbSessions)) {
        Write-Host ("No Matching SMB Sessions Found")
        return
    }

    Write-Verbose "Found $($smbSessions.Count) Matching SMB Sessions"

    $uniqueFileIds = ($smbSessions).FileId | Sort-Object -Unique

    foreach ($fileId in $uniqueFileIds) {
        $session = @($smbSessions | Where-Object { $_.FileId -eq $fileId })[0]

        $sessionId = $session.SessionId
        $username = $session.ClientUserName
        $path = $session.Path

        Write-Verbose "Closing FileId $fileId on SMB Session $sessionId for user $username in path $path"

        try {
            if ($null -ne (Get-SmbOpenFile -FileId $fileId)) {
                ## Yes this is FOUR ways to suppress output. 
                ## Microsoft has proven remarkably resilient at showing an error here.
                ## the ErrorAction Continue still throws an error in TeamCity but not locally
                ## The try catch doesn't catch
                ## The Out-Null is because on the off chance the redirect works on the output, it shouldn't show the faux-error
                ## The output redirection is because this error isn't written to "standard error"
                ## TeamCity seems to be not honoring this output redirection in the shell it's running under to execute this block
                (Close-SmbOpenFile -FileId $fileId -Force -ErrorAction Continue *>&1) | Out-Null
                ## Run this line instead of the above to actually see the error pretty frequently, by my testing
                ## Close-SmbOpenFile -FileId $fileId -Force
            }
        } catch {
            $errorMessage = $_.Exception.Message
            Write-Warning "An Error Occurred While Trying to Close Session $sessionId : $errorMessage"
        }
    }
}

We were originally passing the session but I changed to this $fileId version of the code to see if I could clean it up like this with the unique and etc. Those don't seem to have improved things.

We could very well just do Get-SMBOpenFile | Where-Object <pathmatch> | Close-SMBOpenFile (see for example here https://serverfault.com/questions/718875/close-locked-file-in-windows-share-using-powershell-and-openfiles and here https://community.spiceworks.com/topic/2218597-issue-with-close-smbopenfile ) but as you can see we want to log that we are closing it in case we find that something went wrong and this helps us understand what.

Here's the error I have to fight:

[Clearing File Locks] No MSFT_SMBOpenFile objects found with property 'FileId' equal to '825975900669'.  Verify the value of the property 
[Clearing File Locks] and retry.
[Clearing File Locks] At C:\Program Files\WindowsPowerShell\Modules\...SNIP.psm1:2566 char:34
[Clearing File Locks] +         $jobs | ForEach-Object { Receive-Job -Job $_ }
[Clearing File Locks] +                                  ~~~~~~~~~~~~~~~~~~~
[Clearing File Locks]     + CategoryInfo          : ObjectNotFound: (825975900669:UInt64) [Get-SmbOpenFile], CimJobException
[Clearing File Locks]     + FullyQualifiedErrorId : CmdletizationQuery_NotFound_FileId,Get-SmbOpenFile
[Clearing File Locks]     + PSComputerName        : localhost
[Clearing File Locks]  
[Clearing File Locks] Process exited with code 1

But the thing is, just before I do that delete, I check once more to see that the file is open, right? So I say "does this exist? Yes? Close it" and yet, I get this error that makes no sense to me.

I have tried to come up with other ways on the object that's returned to ensure that I need to remove the file or if there's something that says "this should be skipped" but I can't figure anything out there.

Since I seem to be out of options here, is there an alternative method I've not considered? Some sort of CIMInstance command? I've obviously gone snow-blind if there is. This does run locally on the machine, not across a session.

Someone in my org finally noticed that the error does say Get-SmbOpenFile with the FileId parameter is the failure, so that has to be the same redirection error. At this point it looks like I may have an answer.

Snowblindness sucks

Pertinent machine details of note:

PS Z:\git\...SNIP> $PSVersionTable

Name                           Value
----                           -----
PSVersion                      5.1.17763.1007
PSEdition                      Desktop
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0...}
BuildVersion                   10.0.17763.1007
CLRVersion                     4.0.30319.42000
WSManStackVersion              3.0
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1

PS Z:\git\...SNIP> Get-CimInstance Win32_OperatingSystem | Select-Object Caption, Version, ServicePackMajorVersion, OSArchitecture, CSName, WindowsDirectory


Caption                 : Microsoft Windows 10 Enterprise LTSC
Version                 : 10.0.17763
ServicePackMajorVersion : 0
OSArchitecture          : 64-bit
CSName                  : ...SNIP
WindowsDirectory        : C:\Windows

But this is also running on Windows Server environments. Same version of PowerShell. Latest Windows patches etc on all servers. We haven't yet moved the fleet over to 2019 Datacenter, I know, but we have some odd 800 servers in production/testing across the fleet that I know of, these things take time of course. If 2016 is the problem, then that's the problem.

PS Z:\git\...SNIP> Get-CimInstance Win32_OperatingSystem -ComputerName ...SNIP | Select-Object Caption, Version, ServicePackMajorVersion, OSArchitecture, CSName, WindowsDirectory


Caption                 : Microsoft Windows Server 2016 Datacenter
Version                 : 10.0.14393
ServicePackMajorVersion : 0
OSArchitecture          : 64-bit
CSName                  : ...SNIP
WindowsDirectory        : C:\Windows

Maybe my solution is to get TeamCity to honor the output redirection? Is it Server 2016 not honoring the output redirection? Is this just a pipedream of trying to close these connections reliably? Is there a filesystem version I'm not thinking to check?

When I try to create a file at \\mymachine\c$\temp\temp.txt and open it, this is what I get (note that I'm only using notepad to open the file, so there's no lock ongoing)

PS Z:\git\devops_powershell> Get-SMBOpenFile

FileId        SessionId     Path    ShareRelativePath ClientComputerName   ClientUserName
------        ---------     ----    ----------------- ------------------   --------------
1065151889485 1065151889409 C:\                       ...SNIP              ...SNIP
1065151889489 1065151889409 C:\                       ...SNIP              ...SNIP
1065151889613 1065151889409 C:\temp temp              ...SNIP              ...SNIP
1065151889617 1065151889409 C:\temp temp              ...SNIP              ...SNIP
1065151889833 1065151889409 C:\temp temp              ...SNIP              ...SNIP


PS Z:\git\...SNIP> Get-SmbOpenFile -FileId 1065151889833 | Select-Object -Property *


SmbInstance           : Default
ClientComputerName    : ...SNIP
ClientUserName        : ...SNIP
ClusterNodeName       :
ContinuouslyAvailable : False
Encrypted             : False
FileId                : 1065151889833
Locks                 : 0
Path                  : C:\temp
Permissions           : 1048736
ScopeName             : *
SessionId             : 1065151889409
ShareRelativePath     : temp
Signed                : True
PSComputerName        :
CimClass              : ROOT/Microsoft/Windows/SMB:MSFT_SmbOpenFile
CimInstanceProperties : {ClientComputerName, ClientUserName, ClusterNodeName, ContinuouslyAvailable...}
CimSystemProperties   : Microsoft.Management.Infrastructure.CimSystemProperties



PS Z:\git\...SNIP> Get-SmbOpenFile -FileId 1065151889617 | Select-Object -Property *


SmbInstance           : Default
ClientComputerName    : ...SNIP
ClientUserName        : ...SNIP
ClusterNodeName       :
ContinuouslyAvailable : False
Encrypted             : False
FileId                : 1065151889617
Locks                 : 0
Path                  : C:\temp
Permissions           : 1048705
ScopeName             : *
SessionId             : 1065151889409
ShareRelativePath     : temp
Signed                : True
PSComputerName        :
CimClass              : ROOT/Microsoft/Windows/SMB:MSFT_SmbOpenFile
CimInstanceProperties : {ClientComputerName, ClientUserName, ClusterNodeName, ContinuouslyAvailable...}
CimSystemProperties   : Microsoft.Management.Infrastructure.CimSystemProperties

Should I be focused only on the case where Locks -gt 0?


Solution

  • It looks like we may have narrowed down the root cause due to the Get-SmbOpenFile -FileId $fileId failing. This is probably related to the multiple 4-apart concurrent file listings, such that when, in the last example above, 1065151889485 is closed, it "closes" 1065151889489 as well, and then when we try to iterate on the loop for this value, it can't find it, and thus errors out.

    PS Z:\git\devops_powershell> Get-SMBOpenFile
    
    FileId        SessionId     Path    ShareRelativePath ClientComputerName   ClientUserName
    ------        ---------     ----    ----------------- ------------------   --------------
    1065151889485 1065151889409 C:\                       ...SNIP              ...SNIP
    1065151889489 1065151889409 C:\                       ...SNIP              ...SNIP
    1065151889613 1065151889409 C:\temp temp              ...SNIP              ...SNIP
    1065151889617 1065151889409 C:\temp temp              ...SNIP              ...SNIP
    1065151889833 1065151889409 C:\temp temp              ...SNIP              ...SNIP
    
    

    I'm going to change that Get-SmbOpenFile -FileId $fileId line in the morning and test with the "error bypass" nonsense and see what happens there too. Or just take that check out and try again.

    I'm still very very confused how the try-catch doesn't actively catch the error as thrown. If it did I would just have a Write-Warning instead of the end-process I have now.