Currently the application I'm working with uses strongly typed DataSet
s to work with data from the DB. We have a table called COM_ControlIn
that represents a "file" and several other tables have a relationship with the control table. The one I need to stream from is called COM_GenericTransactionItems
. There is a column in this table called COMControlIn_UID
which links it up to the control table as the name suggests.
We have several methods to fetch data from this table, such as one that finds all records for a given COMControlIn_UID
, but the problem with all of these is that they fetch all records at once, which is becoming a problem now that the sheer amount of data is causing us to hit .NET's memory limit. All of our existing code uses strongly typed datasets built from XSDs generated by Visual Studio from the database schema.
My idea was to use IEnumerable
to "stream" batches of records from the database instead of fetching everything at once, while still keeping the strongly typed datasets we've used previously to keep it compatible without major changes. The code I've written looks more or less like this:
COM_GenericTransactionItemsDS com_GenericTransactionItemsDS = new COM_GenericTransactionItemsDS();
long lastUID = 0;
using (SqlConnection sqlConnection = new SqlConnection("...")
{
sqlConnection.Open();
SqlCommand sqlCommand = new SqlCommand("SELECT MAX(UID) FROM COM_GenericTransactionItems WHERE COMControlIn_UID = " + p_COMControlIn_UID, sqlConnection);
//because apparently I'm not allowed to straight cast...
long maxUID = Convert.ToInt64(sqlCommand.ExecuteScalar());
while (lastUID < maxUID)
{
com_GenericTransactionItemsDS.Clear();
using (SqlDataAdapter sqlDataAdapter = new SqlDataAdapter())
{
//Build Select
string strSQL = "SELECT TOP(" + fetchAmount + ") " + SQL_Columns + " FROM COM_GenericTransactionItems " +
"WHERE COMControlIn_UID = " + p_COMControlIn_UID.ToString() + " AND UID > " + lastUID + " ORDER BY UID";
//Get Data
sqlDataAdapter.SelectCommand = new SqlCommand(strSQL, sqlConnection);
sqlDataAdapter.SelectCommand.CommandTimeout = Convert.ToInt32(context.strContext[(int)eCCE_Context._COMMAND_TIMEOUT]);
sqlDataAdapter.Fill(com_GenericTransactionItemsDS, "COM_GenericTransactionItems");
lastUID = com_GenericTransactionItemsDS.COM_GenericTransactionItems.Max(r => r.UID);
}
yield return com_GenericTransactionItemsDS;
}
}
It works extremely well for fetching data and has dropped our memory usage significantly, but I have run into a problem a little further down the line.
I need to group items within this table by a specific column (a date), but the notion of this conflicts with the whole batching approach, because you need to know what your entire dataset looks like to do the grouping.
I can't do the grouping in SQL because I need the data in a sort of key-value pair like Linq used to give me before I switched to using this method (unless there is a way for me to do this in SQL).
When I try using SelectMany
to flatten all of my rows into one enumerable I get RowNotInTableException
whenever I try to access any of them. I don't really know what else to try.
For reference, this is the Linq query I use to do the grouping:
var dateGroups = from row in p_COM_GenericTransactionItemsDS.SelectMany(c => c.COM_GenericTransactionItems) group row by (DateTime)row[tableDefinitions.CaptureDate] into groups select groups;
I think the problem lies with the way I'm returning data from my streaming method, but I don't know how else to do it. Ideally I'd like to extract all the rows out of our data tables into an IEnumerable
and just iterate through that, but DataRows
don't keep the table's schema (I've read the schema is kept in the DataTable
they're related to), so once you remove them from the dataset they are essentially useless.
I've solved my problem. I changed my streaming method to loop through the items it receives in a batch, make a copy of them and return them one by one, like so:
foreach (COM_GenericTransactionItemsDS.COM_GenericTransactionItemsRow row in com_GenericTransactionItemsDS.COM_GenericTransactionItems.Rows)
{
lastUID = row.UID;
COM_GenericTransactionItemsDS.COM_GenericTransactionItemsRow newRow = com_GenericTransactionItemsDS.COM_GenericTransactionItems.NewCOM_GenericTransactionItemsRow();
newRow.ItemArray = row.ItemArray;
yield return newRow;
}