Search code examples
c#mysql.netsqlitedatareader

while on IDataReader.Read doesn't work with yield return but foreach on reader does


This is a commonly seen ADO.NET pattern to retrieve data from database using a data reader, but strangely doesn't work.

Doesn't work:

public static IEnumerable<IDataRecord> SelectDataRecord<T>(string query, string connString)
                                                          where T : IDbConnection, new()
{
    using (var conn = new T())
    {
        using (var cmd = conn.CreateCommand())
        {
            cmd.CommandText = query;
            cmd.Connection.ConnectionString = connString;

            cmd.Connection.Open();
            using (var reader = (DbDataReader)cmd.ExecuteReader())
            {
                // the main part
                while (reader.Read())
                {
                    yield return (IDataRecord)reader;
                }
            }
        }
    }

This does work:

public static IEnumerable<IDataRecord> SelectDataRecord<T>(string query, string connString)
                                                          where T : IDbConnection, new()
{
    using (var conn = new T())
    {
        using (var cmd = conn.CreateCommand())
        {
            cmd.CommandText = query;
            cmd.Connection.ConnectionString = connString;

            cmd.Connection.Open();
            using (var reader = (DbDataReader)cmd.ExecuteReader())
            {
                // the main part
                foreach (var item in reader.Cast<IDataRecord>())
                {
                    yield return item;
                }
            }
        }
    }

The only relevant change I see is that in the first code the iterator is returned from while loop while in the second it is returned from a foreach loop.

I call it like:

// I have to buffer for some reason
var result = SelectDataRecord<SQLiteConnection>(query, connString).ToList(); 

foreach(var item in result)
{
    item.GetValue(0); // explosion
}

I tried with SQLite .NET connector as well as MySQL connector. The result is the same, i.e. first approach fails, second one succeeds.

Exception

SQLite

An unhandled exception of type 'System.InvalidOperationException' occurred in System.Data.SQLite.dll. Additional information: No current row

MySQL

An unhandled exception of type 'System.Exception' occurred in MySql.Data.dll. Additional information: No current query in data reader

Is it because of the implementation differences between reader.Read and reader.GetEnumerator in the specific ADO.NET connectors? I couldn't see any noticeable difference when I checked the source of System.Data.SQLite project, GetEnumerator calls Read internally. I am assuming ideally in both cases the yield keyword prevents eager execution of the method and the loops have to be executed only once the enumerator is enumerated externally.


Update:

I use this pattern to be on the safe side (essentially the same as the second approach but a little less verbose),

using (var reader = cmd.ExecuteReader())
    foreach (IDataRecord record in reader as IEnumerable)
        yield return record;

Solution

  • The difference between the two examples is because foreach has different semantics from while which is a plain loop. The underlying GetEnumerator of foreach makes the difference here.

    As Joel says, in the first example, the same reader object is yielded on each iteration of the while loop. This is because the both the IDataReader as well as IDataRecord are the same here, which is unfortunate. When a ToList is called on the resulting sequence, the yielding is complete upon which the using blocks closes the reader and connection objects, and you end up with a list of disposed reader objects of the same reference.

    In the second example, the foreach on data reader ensures a copy of IDataRecord is yielded. The GetEnumerator is implemented like this:

    public IEnumerator GetEnumerator()
    {
        return new DbEnumerator(this); // the same in MySQL as well as SQLite ADO.NET connectors
    }
    

    where MoveNext of System.Data.Common.DbEnumerator class is implemented like:

    IDataRecord _current;
    
    public bool MoveNext() // only the essentials
    {
        if (!this._reader.Read())
            return false;
    
        object[] objArray = new object[_schemaInfo.Length];
        this._reader.GetValues(objArray); // caching into obj array
        this._current = new DataRecordInternal(_schemaInfo, objArray); // a new copy made here
        return true;
    }
    

    The DataRecordInternal is the actual implementation of the IDataRecord which is yielded from the foreach which is not the same reference as the reader, but a cached copy of all the values of the row/record.

    The System.Linq.Cast in this case is a mere representation preserving cast which does nothing to the overall effect. Cast<T> will be implemented like this:

    public static IEnumerable<T> Cast<T>(this IEnumerable source)
    {
        foreach (var item in source)
            yield return (T)item; // representation preserving since IDataReader implements IDataRecord
    }
    

    An example without Cast<T> call can be shown to not exhibit this problem.

    using (var reader = cmd.ExecuteReader())
        foreach (var record in reader as IEnumerable)
            yield return record;
    

    The above example just works fine.


    An important distinction to make is that the first example is problematic only if you are not making use of the values read from database in its first enumeration itself. It is only the subsequent enumerations which throw since reader will be disposed by then. An eg,

    using (var reader = cmd.ExecuteReader())
        while (reader.Read())
            yield return reader;
    
    ...
    foreach(var item in ReaderMethod())
    {
        item.GetValue(0); // runs fine
    } 
    
    ...
    foreach(var item in ReaderMethod().ToList())
    {
        item.GetValue(0); // explosion
    }