Search code examples
c#linq

How to use Except inside Where Exists in LINQ


I have the following SQL Server query I am trying to rewrite as LINQ:

-- find people not already in People table based on name match but 
-- other non-id elements not matching. Person_id will not match 
-- between tables. Need person_id at the end of it all.
select p1.person_id, p1.name, p1.birthdate, p1.address
from Persons1 p1
join Persons2 p2
  on p1.name = p2.name
where exists
(
  select p1.name, p1.birthdate, p1.address
  except
  select p2.name, p2.birthdate, p2.address
);

I want to use the Except set operator because it easily helps me identify unmatched data. However, I am falling a little short on how to convert this to a C# LINQ method syntax.

Here's an example:

public class Person
{
    public int Id { get; set; }
    public string Name { get; set; }
    public string Address { get; set; }
    public DateTime Birthdate { get; set; }
}

var sourcePeople = new List<Person>
{
    new Person { Id = 1, Name = "John Doe", Address = "Test Address1", Birthdate = DateTime.Today },
    new Person { Id = 2, Name = "Jane Doe", Address = "Test Address2", Birthdate = DateTime.Today },
    new Person { Id = 3, Name = "Doug Fir", Address = "Test Address3", Birthdate = DateTime.Today }
};

var destinationPeople = new List<Person>
{
    new Person { Id = 4, Name = "John Doe", Address = "Test Address1", Birthdate = DateTime.Today },
    new Person { Id = 5, Name = "Jane Doe", Address = "Test Address5", Birthdate = DateTime.Today },
    new Person { Id = 6, Name = "Doug Fir", Address = "Test Address3", Birthdate = DateTime.Today }
};

This gives me differences but without their ids. I can't include id because then it would return every row since ids don't match

var result = sourcePeople
    .Select(i => new
    {
        i.Name,
        i.Birthdate,
        i.Address
    })
    .Except(destinationPeople
        .Select(p => new
        {
            p.Name,
            p.Birthdate,
            p.Address
        }))
    .ToList();

How can I return all sourcePeople that aren't in destination people? But only for the difference in the subset of properties like I used above. For this I would expect to see only Id 3.

var result2 = sourcePeople
    .Where(s => destinationPeople.???)
    .Select(s => s.Id);

And yes I know about Any() but I haven't a clue of how it would write up in this case.


Solution

  • Except does not work with your example because you are only comparing objects, not collections.

    Below is an example which implements your SQL statement:

    public class Person
    {
        public int Id { get; set; }
        public string Name { get; set; }
        public string Address { get; set; }
        public DateTime Birthdate { get; set; }
    }
    
    private static List<Person> TestLinq()
    {
        var sourcePeople = new List<Person>
        {
            new Person { Id = 1, Name = "John Doe", Address = "Test Address1", Birthdate = DateTime.Today },
            new Person { Id = 2, Name = "Jane Doe", Address = "Test Address2", Birthdate = DateTime.Today },
            new Person { Id = 3, Name = "Doug Fir", Address = "Test Address3", Birthdate = DateTime.Today }
        };
    
        var destinationPeople = new List<Person>
        {
            new Person { Id = 4, Name = "John Doe", Address = "Test Address1", Birthdate = DateTime.Today },
            new Person { Id = 5, Name = "Jane Doe", Address = "Test Address5", Birthdate = DateTime.Today },
            new Person { Id = 6, Name = "Doug Fir", Address = "Test Address3", Birthdate = DateTime.Today }
        };
    
        var desiredSourcePeople = sourcePeople
            .Join(destinationPeople,
                sourcePerson => sourcePerson.Name,
                destinationPerson => destinationPerson.Name,
                (sourcePerson, destinationPerson) => new
                {
                    SpId = sourcePerson.Id,
                    SpName = sourcePerson.Name,
                    SpBirthdate = sourcePerson.Birthdate,
                    SpAddress = sourcePerson.Address,
                    DpName = destinationPerson.Name,
                    DpBirthdate = destinationPerson.Birthdate,
                    DpAddress = destinationPerson.Address
                }
            )
            .Where(joinedPerson =>
                joinedPerson.SpBirthdate != joinedPerson.DpBirthdate
                || joinedPerson.SpAddress != joinedPerson.DpAddress)
            .Select(joinedPerson => new Person
            {
                Id = joinedPerson.SpId,
                Name = joinedPerson.SpName,
                Birthdate = joinedPerson.SpBirthdate,
                Address = joinedPerson.SpAddress
            })
            .ToList();
    
        return desiredSourcePeople;
    }
    

    If you are trying to avoid the complicated Where clause, you could write a method to do the comparison:

    public class Person
    {
        public int Id { get; set; }
        public string Name { get; set; }
        public string Address { get; set; }
        public DateTime Birthdate { get; set; }
    
        public bool IsSameExceptId(Person person)
        {
            return Name == person.Name && Address == person.Address && Birthdate == person.Birthdate;
        }
    }
    
    private static List<Person> TestLinq()
    {
        var sourcePeople = new List<Person>
        {
            new Person { Id = 1, Name = "John Doe", Address = "Test Address1", Birthdate = DateTime.Today },
            new Person { Id = 2, Name = "Jane Doe", Address = "Test Address2", Birthdate = DateTime.Today },
            new Person { Id = 3, Name = "Doug Fir", Address = "Test Address3", Birthdate = DateTime.Today }
        };
    
        var destinationPeople = new List<Person>
        {
            new Person { Id = 4, Name = "John Doe", Address = "Test Address1", Birthdate = DateTime.Today },
            new Person { Id = 5, Name = "Jane Doe", Address = "Test Address5", Birthdate = DateTime.Today },
            new Person { Id = 6, Name = "Doug Fir", Address = "Test Address3", Birthdate = DateTime.Today }
        };
    
        var desiredSourcePeople = sourcePeople
            .Join(destinationPeople,
                sourcePerson => sourcePerson.Name,
                destinationPerson => destinationPerson.Name,
                (sourcePerson, destinationPerson) => new
                {
                    SourcePerson = sourcePerson,
                    DestinationPerson = destinationPerson
                }
            )
            .Where(joinedPerson => !joinedPerson.SourcePerson.IsSameExceptId(joinedPerson.DestinationPerson))
            .Select(joinedPerson => joinedPerson.SourcePerson)
            .ToList();
    
        return desiredSourcePeople;
    }
    

    Also note that this second example stores the objects in joinedPerson rather than splitting out to just the desired values.