Search code examples
powershellperformanceautomationscriptingpowershell-7.0

Is there an equivalent for the Matlab function "ismember" in Powershell?


Ex: I have a script that compares file information to a previously saved database file that has the same file information plus the calculated hash of each file. If the current file properties match what is already in the database file (modified date / size / ect), I inherit the stored hash file instead of recalculating it.

However, this is done for thousands of files, and the "for-each" loop that tries to find a matching file index in the database to get the corresponding hash ends up taking a while (about 60 ms per file).

I assume there is a way I can compare the array of file information to the saved database as a whole (instead of each file in a loop), and associate the corresponding hash file from the database in a single command, but it is not clear to me how to do that.

Example of the code below (I've stripped out some of the progress update callbacks for clarity, and I've confirmed those are not adding substantial delay). The loop slows down when the corresponding hash is pulled from the "$AllOldSrcProps" which is the database I noted earlier.

I just really want to compare the "$AllFiles" to "$AllOldSrcProps", and copy the "Hash" property from "$AllOldSrcProps" to "$AllFiles" when the other properties match.

    foreach ($file in $AllFiles) {
            if($file.FullName.StartsWith($SrcPath)){
                $file.FullPotLength = $file.FullName.Length - $SrcLen + $ModLen
                $file.LocKey = $SrcKey
                #If we're not rebuilding the hash, recalculate
                if(-not($RebuildSrcHashTblFlag) -and $AllOldSrcProps){
                    $MatchingFile = @($AllOldSrcProps | ?{( $_.FullName -eq $file.FullName) -and ( $_.Length -eq $file.Length) -and ($_.LastWriteTime -eq $file.LastWriteTime.ToString())})
                    if($MatchingFile.Count -eq 1){
                        $file.Hash = $MatchingFile.Hash
                        $MatchedHash[0] = $MatchedHash[0] +1
                    }
                }
            }
        }

Solution

  • The performance issue in your code is most likely when you do the linear comparison here:

    $MatchingFile = @($AllOldSrcProps | Where-Object { ($_.FullName -eq $file.FullName) -and .... }
    

    The way you could improve performance is by using a dictionary type, like a hashtable, the issue however is that you're comparing 3 properties so you need to use a structure that implements IEquatable<T>. One way you could sort that issue is by using ValueTuple as your hash keys, tuples are inherently comparable and equatable. So before your code, you could do this:

    $map = @{}
    foreach ($src in $AllOldSrcProps) {
        $key = [System.ValueTuple[string, long, datetime]]::new(
            $src.FullName, $src.Length, $src.LastWriteTime)
    
        $map[$key] = $src
    }
    

    Then, in your actual code, you would replace that Where-Object for indexing your $map hash, like so:

    foreach ($file in $AllFiles) {
        if ($file.FullName.StartsWith($SrcPath)) {
            $file.FullPotLength = $file.FullName.Length - $SrcLen + $ModLen
            $file.LocKey = $SrcKey
    
            # If we're not rebuilding the hash, recalculate
            if (-not $RebuildSrcHashTblFlag -and $AllOldSrcProps) {
                $key = [System.ValueTuple[string, long, datetime]]::new(
                    $file.FullName, $file.Length, $file.LastWriteTime)
    
                $MatchingFile = $map[$key]
    
                if ($MatchingFile) {
                    $file.Hash = $MatchingFile.Hash
                    $MatchedHash[0]++
                }
            }
        }
    }