Search code examples
linq-to-sqldesign-patternslinq-to-entitiespipes-filters

How do you implement Pipes and Filters pattern with LinqToSQL/Entity Framework/NHibernate?


While building by DAL Repository, I stumbled upon a concept called Pipes and Filters. I read about it here, here and saw a screencast from here. I am still not sure how to go about implementing this pattern. Theoretically all sounds good , but how do we really implement this in an enterprise scenario?

I will appreciate, if you have any resources,tips or examples ro explanation for this pattern in context to the data mappers/ORM mentioned in the question.

Thanks in advance!!


Solution

  • Ultimately, LINQ on IEnumerable<T> is a pipes and filters implementation. IEnumerable<T> is a streaming API - meaning that data is lazily returns as you ask for it (via iterator blocks), rather than loading everything at once, and returning a big buffer of records.

    This means that your query:

    var qry = from row in source // IEnumerable<T>
              where row.Foo == "abc"
              select new {row.ID, row.Name};
    

    is:

    var qry = source.Where(row => row.Foo == "abc")
                .Select(row = > new {row.ID, row.Name});
    

    as you enumerate over this, it will consume the data lazily. You can see this graphically with Jon Skeet's Visual LINQ. The only things that break the pipe are things that force buffering; OrderBy, GroupBy, etc. For high volume work, Jon and myself worked on Push LINQ for doing aggregates without buffering in such scenarios.

    IQueryable<T> (exposed by most ORM tools - LINQ-to-SQL, Entity Framework, LINQ-to-NHibernate) is a slightly different beast; because the database engine is going to do most of the heavy lifting, the chances are that most of the steps are already done - all that is left is to consume an IDataReader and project this to objects/values - but that is still typically a pipe (IQueryable<T> implements IEnumerable<T>) unless you call .ToArray(), .ToList() etc.

    With regard to use in enterprise... my view is that it is fine to use IQueryable<T> to write composable queries inside the repository, but they shouldn't leave the repository - as that would make the internal operation of the repository subject to the caller, so you would be unable to properly unit test / profile / optimize / etc. I've taken to doing clever things in the repository, but return lists/arrays. This also means my repository stays unaware of the implementation.

    This is a shame - as the temptation to "return" IQueryable<T> from a repository method is quite large; for example, this would allow the caller to add paging/filters/etc - but remember that they haven't actually consumed the data yet. This makes resource management a pain. Also, in MVC etc you'd need to ensure that the controller calls .ToList() or similar, so that it isn't the view that is controlling data access (otherwise, again, you can't unit test the controller properly).

    A safe (IMO) use of filters in the DAL would be things like:

    public Customer[] List(string name, string countryCode) {
         using(var ctx = new CustomerDataContext()) {
             IQueryable<Customer> qry = ctx.Customers.Where(x=>x.IsOpen);
             if(!string.IsNullOrEmpty(name)) {
                 qry = qry.Where(cust => cust.Name.Contains(name));
             }
             if(!string.IsNullOrEmpty(countryCode)) {
                 qry = qry.Where(cust => cust.CountryCode == countryCode);
             }
             return qry.ToArray();
         }
    }
    

    Here we've added filters on-the-fly, but nothing happens until we call ToArray. At this point, the data is obtained and returned (disposing the data-context in the process). This can be fully unit tested. If we did something similar but just returned IQueryable<T>, the caller might do something like:

     var custs = customerRepository.GetCustomers()
           .Where(x=>SomeUnmappedFunction(x));
    

    And all of a sudden our DAL starts failing (cannot translate SomeUnmappedFunction to TSQL, etc). You can still do a lot of interesting things in the repository, though.

    The only pain point here is that it might push you to have a few overloads to support different calling patterns (with/without paging, etc). Until optional/named parameters arrives, I find the best answer here is to use extension methods on the interface; that way, I only need one concrete repository implementation:

    class CustomerRepository {
        public Customer[] List(
            string name, string countryCode,
            int? pageSize, int? pageNumber) {...}
    }
    interface ICustomerRepository {
        Customer[] List(
            string name, string countryCode,
            int? pageSize, int? pageNumber);
    }
    static class CustomerRepositoryExtensions {
        public static Customer[] List(
              this ICustomerRepository repo,
              string name, string countryCode) {
           return repo.List(name, countryCode, null, null); 
        }
    }
    

    Now we have virtual overloads (as extension methods) on ICustomerRepository - so our caller can use repo.List("abc","def") without having to specify the paging.


    Finally - without LINQ, using pipes and filters becomes a lot more painful. You'll be writing some kind of text based query (TSQL, ESQL, HQL). You can obviously append strings, but it isn't very "pipe/filter"-ish. The "Criteria API" is a bit better - but not as elegant as LINQ.