Search code examples
linqpowershelldatatable

How use linq explict in powershell or analogue of "NOT IN" in SQL


I have a question about using Linq in PowerShell. I can not figure out how to correctly use the Except method

Example tables:

$Arr = 1..1000
$Props = ("employeeID","FindName1","FindName2")
$Table1 = New-Object System.Data.DataTable "Table1"
$Props | ForEach-Object { $Table1.Columns.Add( $_ , [String]) | Out-Null }

ForEach ($Record in $Arr ) {
    $Row = $Table1.NewRow()
    $Row.employeeID = $Record.ToString("00000")
    $Row.FindName1 = "UserName_" + $Record.ToString()
    $Row.FindName2 = "String_" + $Record.ToString("00000000")
    $Table1.Rows.Add($Row)
}

$Arr2 = 980..1111
$Props = ("employeeID","FindName1")
$Table2 = New-Object System.Data.DataTable "Table2"
$Props | ForEach-Object { $Table2.Columns.Add( $_ , [String]) | Out-Null }

ForEach ($Record in $Arr2 ) {
    $Row = $Table2.NewRow()
    $Row.employeeID = $Record.ToString("00000")
    $Row.FindName1 = "UserName_" + $Record.ToString()
    $Table2.Rows.Add($Row)
}

As a result of the work, I want to get records from the $table1 where FindName1 not in $Table2.FindName1, preserving all the headers

An attempt to perform does not produce the expected result.

$ExceptOut = [System.Linq.Enumerable]::Except($Table1.FindName1, $Table2.FindName1)

As I understood from the article , i need to create my own class with methods that allow me to use a LINQ in the tables. But I am extremely far from programming. Or maybe there is some other fast analogue of "NOT IN" in SQL. I hope for help. Thanks.


Solution

  • For the (generic) set-difference .Except() LINQ method to work, the two enumerables (IEnumerable<T>) passed as arguments must:

    • enumerate instances of the same type T

      • However, it is possible to use Object for T, thereby effectively supporting PowerShell's potentially mixed-type regular Object[] arrays ([object[]] in PowerShell's type-literal notation).
    • and, if that type is a reference type whose instance should compare meaningfully based on the content of instances (rather than by mere reference equality, i.e. identity), must implement the IEquatable<T> interface and/or override the .Equals() method (and therefore also the .GetHashCode() method).

    PowerShell is seemingly not able to find the right overload for .Except() with the [object[]] arrays returned by $Table1.FindName1 and $Table2.FindName1 (also see the note re v7.3+ below), though these arrays technically fulfill the above requirements - I don't know why.

    However, simply casting one of these arrays to what it already is - [object[]] - solves the problem:

    [Linq.Enumerable]::Except([object[]] $Table1.FindName1, $Table2.FindName1)
    

    Note:

    • As shown above, it is sufficient to cast one of the enumerables to [object[]] in order for PowerShell to infer the correct generic type argument for the method.

      • In PowerShell (Core) 7.3+ it is now possible to call generic methods with explicit type arguments (see about_Calling_Generic_Methods), which allows simplifying the solution to:

        # Note the '[object]' right after 'Except',
        # specifying the generic type argument.
        [Linq.Enumerable]::Except[object]($Table1.FindName1, $Table2.FindName1)
        
    • Given that the .FindName1 column ultimately contains strings, you can also cast - both - enumerables to [string[]], though this implicitly creates a copy of each array, which is unnecessary here.


    Now if you want to return whole rows while using the .FindName1 column only for comparison, things get much more complex:

    • You must implement a custom comparer class that implements the IEqualityComparer[T]interface.

    • You must cast the .Rows collection of the data tables to IEnumerable[DataRow], which requires calling the System.Linq.Enumerable.Cast() method via reflection (again, see further below for a simpler v7.3+ solution).

      • Note: While you could directly cast to [DataRow[]], this would involve inefficient conversion of the rows collection to an array.

    Here's a PSv5+ solution that implements the custom comparer class as a PowerShell class:

    # A custom comparer class that compares two DataRow instances by their
    # .FindName1 column.
    class CustomTableComparer : Collections.Generic.IEqualityComparer[Data.DataRow] {
      [bool] Equals([Data.DataRow] $x, [Data.DataRow] $y) {
        return [string]::Equals($x.FindName1, $y.FindName1, 'Ordinal')
      }
      [int] GetHashCode([Data.DataRow] $row) {
        # Note: Any two rows for which Equals() returns $true must return the same
        #       hash code. Because *ordinal, case-sensitive* string comparison is
        #       used above, it's sufficient to simply call .GetHashCode() on
        #       the .FindName1 property value, but that would have to be tweaked
        #       for other types of string comparisons.
        return $row.FindName1.GetHashCode();
      }
    }
    
    
    # Use reflection to get a reference to a .Cast() method instantiation 
    # that casts to IEnumerable<DataRow>.
    $toIEnumerable = [Linq.Enumerable].GetMethod('Cast').MakeGenericMethod([Data.DataRow])
    
    # Call .Except() with the casts and the custom comparer.
    # Note the need to wrap the .Rows value in an aux. single-element
    # array - (, ...) - for it to be treated as a single argument.
    [Linq.Enumerable]::Except(
        $toIEnumerable.Invoke($null, (, $Table1.Rows)), 
        $toIEnumerable.Invoke($null, (, $Table2.Rows)), 
        [CustomTableComparer]::new()
    )
    

    PowerShell 7.3+ simplification:

    The ability to specify generic method type arguments directly makes the reflection-based approach unnecessary and simplifies the approach:

    [Linq.Enumerable]::Except[Data.DataRow](
      [Linq.Enumerable]::Cast[Data.DataRow]($Table1.Rows), 
      [Linq.Enumerable]::Cast[Data.DataRow]($Table2.Rows), 
      [CustomTableComparer]::new()
    )
    

    Note: The [Linq.Enumerable]::Cast[Data.DataRow]() calls are still needed, because the .Rows property of System.Data.DataTable instances implements only IEnumerable, not also IEnumerable[System.Data.DataRow].


    GitHub issue #2226 proposes making LINQ a first-class PowerShell citizen.