Search code examples
c#linqdatareader

Find longest string from DataReader column


Is It possible to create a Linq for retrieving longest string values of DataReader columns ? Column data should be converted to string and then return longest string.

What I know so far is how to get same thing with DataTable which is Enumerable (as I asked here), example:

string maxString = dt
  .AsEnumerable()
  .Select(row => row[mycolumn].ToString())
  .OrderByDescending(st => st.Length)
  .FirstOrDefault();

I've tried combining upper solution for DataReader like this:

var enumerable = reader
  .Cast<IDataRecord>();

string max_string = enumerable
  .Select(record => record.GetName(col).ToString())
  .Aggregate((s, a) => a.Length > s.Length ? a : s);

Unfortunally this doesn't work, I get

Sequence contains no elements in at System.Linq.Enumerable.Aggregate

error. Thanks for help in advance :)

EDIT: I'm looking for a solution without loading data into Datatable etc., just directly from DataReader object. I'm trying to avoid "Out of memory exception", because data is large.

Latest attempt, suggested by Power Mouse (It returns correct value, but from column 1 only):

     for (int col = 0; col < reader.FieldCount; col++)
     {
         string col_name = reader.GetName(col).ToString();
         var enumerable = reader.Cast<IDataRecord>();

         string max_string = enumerable.Where(x => enumerable.Max(y => y[col_name].ToString()
                             .Length) == x[col_name].ToString().Length)
                             .FirstOrDefault()?[col_name].ToString();

         Console.WriteLine("max string of column is : " + max_string);
      }

Solution

  • I solved my problem, unfortunally without LINQ. Problem is that with DataReader you cannot just simply loop through rows & columns as you can with DataTable once stored in memory, but you must perfom somekind of same logic while reader.Read() method is running.

    So, best thing I could came up with is to store column indexes and their string values into Dictionary while .Read() method is running.

    Doing that, you must be careful about string blank spaces & null values. Here is my solution, which runs good for me:

     Dictionary<int, string> col_values = new Dictionary<int, string>();
    
     using (OracleDataReader reader = cmd.ExecuteReader())
     {
          for (int i = 0; i < reader.FieldCount; i++)
          {
              //First add indexes to Dictionary 
              // I add column names here - didn't test for string.Empty !!
              col_values.Add(i, string.Empty);
          }
    
          //Then read row by row and modify Dictionary - If text is larger than string.Empty
          //Dictionary must be .ToArray(), or else you'll have an error for modifying collection
          while (reader.Read())
          {
              foreach (var item in col_values.ToArray())
              {
                 string rdr_text = reader[item.Key].ToString().Trim()??string.Empty;
    
                 if (item.Value.Length<rdr_text.Length)
                 {
                     col_values[item.Key] = rdr_text;
                 }
              }
          }
    
          foreach (var item in col_values)
          {
             //...And here we have all longest strings stored, for each column...Job done
          }
     }
    

    For my purpuses this Iteration reads around 2.3 mio rows with 12 columns in 4 minutes. It's not fast, but at least It works. If anyone has better/faster idea please provide answer.