Search code examples
c#datatabledatagriddataview

Validation in DataTable C#


I have a DataTable with 20 columns and 25000 Rows. There is a column called URL and a column Language.

I need to ensure that all same URLs have the same Language.

Presently I have achieved this by following steps

  1. Get all distinct (unique) URLs

  2. Created a foreach loop on URLs and create a DataView (filtered on the URL)

  3. Now in the dataview I can check if all values in the Language columns are the same.

     List<string> all_Distinct_Urls = helperFunction.DataTableToList(master_table, "URL");
    
     foreach (var url in all_Distinct_Urls)
     {
         if (!string.IsNullOrEmpty(url))
         {
             DataView dv = new DataView(master_table);
             dv.RowFilter = "[URL] = '" + url + "'";
             DataTable temp_MasterTable = dv.ToTable();
    
             List<string> all_languages = helperFunction.DataTableToList(temp_MasterTable, "Language");
             if (all_languages.Count > 1)
             {
               Assert.Fail();
             }
      }
    
    
     public List<string> DataTableToList(DataTable masterDataTable, string columnName, bool isDistinct = true) 
     { 
         List<string> list = new List<string>(); 
         foreach (DataRow dataRow in masterDataTable.Rows) 
         { 
             string ID = dataRow[columnName].ToString().Trim(); 
             list.Add(ID); 
         } 
         if (isDistinct) 
         { 
             list = list.Distinct().ToList(); 
         } 
         return list; 
    }
    

But the problem is that this is consuming a lot of time, given the number of rows and column. Is there any faster way to achieve this?


Solution

  • I would use LINQ. I'm sure this approach will be a lot faster:

    var invalidUrlLanguageGroups = master_table.AsEnumerable()
        .GroupBy(r => r.Field<string>("Url"))
        .Where(g => g.Select(r => r.Field<string>("Language")).Distinct().Skip(1).Any())
        .ToList();
    

    I groups by the url and then selects all distinct languages and echecks if theres more than one.

    Testcase:

    var master_table = new DataTable();
    master_table.Columns.Add("Url");
    master_table.Columns.Add("Language");
    master_table.Rows.Add("/en-us/sample-page1", "english");
    master_table.Rows.Add("/en-us/sample-page1", "german"); // fail
    master_table.Rows.Add("/de-de/sample-page2", "german");
    master_table.Rows.Add("/en-de/sample-page2", "english");
    

    Note that the query collects all invalid urls and their DataRows. If you want an even more efficient query that only determines if there's at least one(to make the test fail), use:

    bool anyInvalidUrlLanguageGroups = master_table.AsEnumerable()
        .GroupBy(r => r.Field<string>("Url"))
        .Any(g => g.Select(r => r.Field<string>("Language")).Distinct().Skip(1).Any());
    

    how about if I want to validate that all columns are the same, not just the Language column? So if the URL is the same then all column values should be the same

    Well, then this method would be helpful to check if all columns(for each url-group) are equal. You can use it in many other cases too, so would be a good candidate for an extension:

    public static bool AllItemsEqual<T>(IEnumerable<IEnumerable<T>> allSequences, IEqualityComparer<T> comparer = null)
    {
        if (comparer == null) comparer = EqualityComparer<T>.Default;
        IEnumerable<T> first = null;
        foreach(IEnumerable<T> items in allSequences)
        {
            if (first == null) 
                first = items;
            else
            {
                if (!items.SequenceEqual(first, comparer))
                    return false;
            }
        }
    
        return true;
    }
    

    You will use it then in this way:

    List<string> columnsExceptUrl = master_table.Columns.Cast<DataColumn>()
        .Select(c => c.ColumnName)
        .Where(n => n != "Url")
        .ToList();
    
    var urlRowsWithDifferentColumns = master_table.AsEnumerable()
       .GroupBy(r => r.Field<string>("Url"))
       .Where(g => !AllItemsEqual(g.Select(r => columnsExceptUrl.Select(c => r[c].ToString()))))
       .ToList();
    

    again, if you just want to know if it fails, you can make it more efficient:

    bool anyUrlRowsWithDifferentColumns = master_table.AsEnumerable()
       .GroupBy(r => r.Field<string>("Url"))
       .Any(g => !AllItemsEqual(g.Select(r => columnsExceptUrl.Select(c => r[c].ToString()))));