powershell compare comparison file-search

Powershell - how to return results of filenames that don't have a "partner"

I'm attempting to find files in a folder of filenames that look like the following:

C:\XMLFiles\
in.blahblah.xml
out.blahblah.xml
in.blah.xml
out.blah.xml

I need to return results of only files that do not have it's "counterpart". This folder contains thousands of files with randomized "center" portions of the file names....the commonality is in/out and ".xml".

Is there a way to do this in Powershell? It's an odd ask.

Thanks.

Solution

Your question is a little vague. I hope I got it right. Here is how I would do it.

$dir = 'my_dir'

$singleFiles = [System.Collections.Generic.HashSet[string]]::new()
Get-ChildItem $dir -Filter '*.xml' | ForEach-Object {
    if ($_.BaseName -match '^(?<prefix>in|out)(?<rest>\..+)') {
        $oppositeFileName = if ($Matches.prefix -eq 'in') {
            'out'
        }
        else {
            'in'
        }

        $oppositeFileName += $Matches.rest + $_.Extension
        $oppositeFileFullName = Join-Path $_.DirectoryName -ChildPath $oppositeFileName
        if ($singleFiles.Contains($oppositeFileFullName)) {
            $singleFiles.Remove($oppositeFileFullName) | Out-Null
        }
        else {
            $singleFiles.Add($_.FullName) | Out-Null
        }
    }
}

$singleFiles

I'm getting all the XML files from the directory and I'm iterating the results. I check the base name of the file (the name of the file doesn't include the directory path and the extension) if they match a regex. The regex says: match if the name starts with in or out followed by at least 1 character.

The $Matches automatic variable contains the matched groups. Based on these groups I'm building the name of the counter-part file: i.e. if I'm currently on in.abc I build out.abc.

After that, I'm building the absolute path of the file counter-part file and I check if it exists in the HashSet. if It does, I remove it because that means that at some point I iterated that file. Otherwise, I'm adding the current file.

The resulting HashSet will contain the files that do not have the counter part.

Tell me if you need a more detailed explanation and I will go line by line. It could be refactored a bit, but it does the job.