Search code examples
c#sql-server-2014sqlconnection

c# Better way to retrieve large bulks of data from SQL Server 2014?


Context: I retrieve bulks of data to index them in an ElasticSearch cluster (it has no-sql format).

I do this by determining how many delegates do I need to create with a for loop depending of how many rows does the current analyzed table has compared to my bulk_size, and execute them to do the code below.

I use offset-fetch. tb represents the current table, and my current bulk_size is 5000.

My code looks like this:

using (SqlConnection db = new SqlConnection(tb.Conn))
{
    int offset = i * BULK_SIZE;
    int nextRows = BULK_SIZE;

    string queryParams = string.Format(" ORDER BY {0} OFFSET {1} ROWS FETCH NEXT {2} ROWS ONLY", tb.FirstPK(), offset, nextRows);

    string dataQuery = string.Format("SELECT * FROM dbo.{0} {1}", tb.TableName, queryParams);

    try
    {
        db.Open();
        object[,] bulk = new object[BULK_SIZE, 2];//[data[], id]
        int n = 0;
        object[] values;

        using (SqlDataReader reader = new SqlCommand(dataQuery, db).ExecuteReader())
        {
            while (reader.Read())
            {
                string docId = string.Empty;

                foreach (string pk in tb.PKeys)
                {
                    docId += reader[pk].ToString() + '_';
                }

                docId = docId.Substring(0, docId.Length - 1);//delete last '_'
                values = new object[reader.FieldCount];

                reader.GetValues(values);

                bulk[n, 0] = values;
                bulk[n, 1] = docId;

                n++;
            }

            reader.Close();
        }

        db.Close();

        if (IsMemoryAvailable())//Waits for other delegates to finish
        {
            DataUpload(bulk, tb, mapper);
        }
    }
    catch (Exception e)
    {
        Console.WriteLine(e.Message); Console.WriteLine(e.StackTrace); Console.ReadLine();
        //throw e;
    }
}

Is there a better way to do this?

I get then bulk, insert it into an array and handle that bulk in another task.

The problem is that the memory of SQL Server keeps going up (cache goes too big), and in late-bulk-fetching I get timeout exceptions and such; it gets to slow.


Solution

  • The solution is to create a temp table with only the primary keys, and query back and forth from the table and the temp table with joins.

    This is way more memory efficient than offset-fetch, and also faster.