Search code examples
powershellcomparecomparisonfile-search

Powershell - how to return results of filenames that don't have a "partner"


I'm attempting to find files in a folder of filenames that look like the following:

C:\XMLFiles\
in.blahblah.xml
out.blahblah.xml
in.blah.xml
out.blah.xml

I need to return results of only files that do not have it's "counterpart". This folder contains thousands of files with randomized "center" portions of the file names....the commonality is in/out and ".xml".

Is there a way to do this in Powershell? It's an odd ask.

Thanks.


Solution

  • Your question is a little vague. I hope I got it right. Here is how I would do it.

    $dir = 'my_dir'
    
    $singleFiles = [System.Collections.Generic.HashSet[string]]::new()
    Get-ChildItem $dir -Filter '*.xml' | ForEach-Object {
        if ($_.BaseName -match '^(?<prefix>in|out)(?<rest>\..+)') {
            $oppositeFileName = if ($Matches.prefix -eq 'in') {
                'out'
            }
            else {
                'in'
            }
    
            $oppositeFileName += $Matches.rest + $_.Extension
            $oppositeFileFullName = Join-Path $_.DirectoryName -ChildPath $oppositeFileName
            if ($singleFiles.Contains($oppositeFileFullName)) {
                $singleFiles.Remove($oppositeFileFullName) | Out-Null
            }
            else {
                $singleFiles.Add($_.FullName) | Out-Null
            }
        }
    }
    
    $singleFiles
    

    I'm getting all the XML files from the directory and I'm iterating the results. I check the base name of the file (the name of the file doesn't include the directory path and the extension) if they match a regex. The regex says: match if the name starts with in or out followed by at least 1 character.

    The $Matches automatic variable contains the matched groups. Based on these groups I'm building the name of the counter-part file: i.e. if I'm currently on in.abc I build out.abc.

    After that, I'm building the absolute path of the file counter-part file and I check if it exists in the HashSet. if It does, I remove it because that means that at some point I iterated that file. Otherwise, I'm adding the current file.

    The resulting HashSet will contain the files that do not have the counter part.

    Tell me if you need a more detailed explanation and I will go line by line. It could be refactored a bit, but it does the job.