c#sql-server ado.net query-optimization data-access

How can I reduce the amount of SQL-querying in my code?

I'm writing a big C# application that communicates with a MS-SQL Server database.

As the app grows bigger, I find myself writing more and more "boilerplate" code containing various SQL queries in various classes and forms like this:

using System;
using System.Collections.Generic;
using System.Data;
using System.Data.SqlClient;
using System.Windows.Forms;

public class SomeForm : Form
{
    public void LoadData(int ticketId)
    {
        // some multi-table SQL select and join query
        string sqlQuery = @"
            SELECT TOP(1) [Foo].Id AS FooId, [Foo].Name AS FooName, [Foo].Address AS FooAddress
                        [Bar].Name AS BarName, [Bar].UnitPrice AS BarPrice,
                        [Bif].Plop
            FROM        [dbo].[Foo]
            INNER JOIN  [dbo].[Bar]
            ON          [Bar].Id = [Foo].BarId
            INNER JOIN  [dbo].[Bif]
            ON          [Bar].BifId = [Bif].Id
            WHERE       [Foo].TicketId = @ticketId";

        SqlCommand sqlCmd = new SqlCommand();
        sqlCmd.CommandText = sqlQuery;
        sqlCmd.Parameters.AddWithValue("@ticketId", ticketId);

        // connection string params etc and connection open/close handled by this call below
        DataTable resultsDataTable = SqlQueryHelper.ExecuteSqlReadCommand(sqlCmd);

        if (resultsDataTable.Rows.Count > 0)
        {
            var row = resultsDataTable.Rows[0];

            // read-out the fields
            int fooId = 0;
            if (!row.IsNull("FooId"))
                fooId = row.Field<int>("FooId");

            string fooName = "";
            if (!row.IsNull("FooName"))
                fooName = row.Field<string>("FooName");

            // read out further fields...

            // display in form
            this.fooNameTextBox.Text = fooName;
            // etc.
        }
    }
}

There are dozens of forms in this project all doing conceptually the same thing, just with different SQL queries (different columns selected, etc.) And each time the forms are opened, the database is being continually queried.

For a local DB server the speed is OK but using the app over a slow VPN is painful.

Are there better ways of cutting down the amount of querying the database? Some sort of caching the database in memory and performing the queries on the in-memory data?

I've added some data tables to a data source in my project but can't understand how I can do complex queries like the one stated above.

Is there a better way of doing this?

Thanks for all your suggestions folks!

Solution

I think non of the recommendations like using a framework to access a datase will solve your problem. The problem is not writing SQL or LINQ queries. You always have to dispatch a query to the database or set of data at some point.

The problem is from where you query the database.
Your statement about writing "code containing various SQL queries in various classes and forms" gives me chills. I recently worked for a company and they did exactly the same. As a result they can't maintain their database anymore. Maintenance is still possible, but only rudimentary and very expensive/time consuming and very frustrating - so nobody likes to do it and therefore nobody does it and as a result it gets worse and worse. Queries are getting slower and slower and the only possible quick fix is to buy more bandwidth for the database server.

The actual queries are scattered all over the projects making it impossible to improve/refactor queries (and identify them) or improve the table design. But the worst is, they are not able to switch to a faster or cheaper database model e.g. a graph structured database, although there is a burning desire and an urgent need to do so. It makes you look sloppy in front of customers. Absolutely no fun at all to work in such an environment (so I left). Sounds bad?

You really should decouple the database and the SQL from your business code. All this should be hidden behind an interface:

IRepository repository = Factory.GetRepository();

// IRepository exposes query methods, which hide 
// the actual query and the database related code details from the repository client or business logic
var customers = repository.GetCustomers();

Spreading this code through your project doesn't hurt. It will improve maintainability and readability. It separates the data persistence from the actual business logic, since you are hiding/encapsulating all the details like the actual database, the queries and the query language. If you want to switch to another database you just have to implement a new IRepository to modify the existing queries. Changing the database won't break the application.

And all queries are implemented in one location/layer. And when talking about queries, this includes LINQ queries as well (that's why using a framework like Entity Framework doesn't solve your problem). You can use dependency injection or Factory pattern to distribute the implementation of IRepository. This even allows to switch between different databases during runtime without recompiling the application.

Using this pattern (Repository Pattern) also allows to decouple frameworks like Entitiy Framework from your business logic.

Take a look at the Repository Pattern. If it's not too late you should start to refactor your existing code. The price is too high if keep it like it is.

Regarding caching I know that the database or the DBMS already handles the caching of data very very efficiently. What you can do is to

cache data locally e.g. if the data won't change remotely (because some other client will modify it) e.g. local client settings. This way you can reduce network traffic significantly.
cache data locally e.g. if you access this data set frequently and remote changes are not likely to occur or will have impact. You can update the local data store periodically or invalidate it e.g. after a period of time to force the client to dispatch a new query in order to update the cache.
You also can filter data locally using LINQ. This may also reduce network traffic although you may end up reading more data the necessary. Also filtering is generally done more efficient by the DBMS.
You can consider to upgrade the database server or VPN to increase bandwidth.
Indexing will also improve lookup times significantly.
You should consider to refactor all your SQL queries. There are many articles on how to improve performance of SQL queries. The way you build up a query will have significant impact on performance or execution times especially with big data.
You can use data virtualization. It doesn't make sense to pull thousands of records from the database when you can only show 20 of them to the user. Pull more data when the user scrolls the view. Or even better, display a pre-selected list e.g. of most recent items and the allow the user to search for the data of interest. This way you read only the data that the user explicitly asked for. This will improve overall performance drastically, since usually the user is only interested in very few records.

Before introducing an interface (Dependency Inversion)

The following examples are meant to show, that this is a question of architecture or design, instead a question of frameworks or libraries. Libraries or frameworks can help on a different level, but won't solve the problem, which is introduced by spreading environment specific queries all over the business code. Queries should always be neutral. When looking at the business code, you shouldn't be able to tell if the data is fetched from a file or a database. This details must be hidden or encapsulated.

When you spread the actual database access code (whether directly using plain SQL or with help of a framework) throughout your business code, you are not able to write unit tests without a database attached. This is not desired. It makes testing too complicated and the tests will execute unnecessarily slow. You want to test your business logic and not the database. This are separate tests. You usually want to mock the database away.

The problem:
you need data from the database in multiple places across the application's model or business logic.

The most intuitive approach is to dispatch a database query whenever and where ever you need the data. This means, when the database is a Postgre database, all the code would of course use PostgreSQL or some framework like Entity Framework or ORM in general. If you decide to change the database or DBMS e.g. to some Oracle or want to use a different framework to manage your entities, you would be forced to touch and rewrite every code that uses PostgreSQL or Entity Framework.

In a big business application, this will be the reason that forces your company to stay with what you have and leaves your team dreaming of a better world. Frustration level will rise. Maintaining database related code is nearly impossible, error prone and time consuming. Since the actual database access is not centralized, rewriting the database related code means to crawl through the complete application. Worst case is the spreading of meaningless SQL query strings, nobody understands or remembers. Impossible to move to a new database or to refactor queries to improve performance, without scarifying valuable and expensive time and team resources.

Imaging the following simplified symbolic method is repeated in some form across the application's business logic, maybe accessing different entities and using different filters, but using the same query language, framework or library. Let's say we find similar code a thousand times:

private IEnumerable GetCustomers()
{
  // Use Entity Framework to manage the database directly
  return DbContext.Customers;
}

We have introduced a tight coupling to the framework as it is woven deep into our business code. The code "knows" how the database is manged. It knows about Entity Framework as it has to use its classes or API everywhere.
The proof is, that if you would want to replace Entity Framework with some other framework or just want to drop it, you would have to refactor the code in thousand places - everywhere you used this framework in your application.

After introducing an interface (Dependency Inversion) and encapsulating all the database access

Dependency Inversion will help to remove a dependency on concrete classes by introducing interfaces. Since we prefer loose coupling between components and classes to enhance flexibility, testability and maintainability when using a helper framework or plain SQL dialect, we have to wrap this specific code and hide it behind an interface (Repository Pattern).

Instead of having a thousand places, which explicitly use the database framework or SQL or LINQ queries to read, write or filter data, we now introduce interface methods e.g GetHighPriorityCustomers and GetAllCustomers. How the data is supplied or from which kind of database it is fetched are details, that are only known to the implementation of this interface.

Now the application no longer uses any framework or database specific languages directly:

interface IRepository
{
  IEnumerable<Customer> GetHighPriorityCustomers();
  IEnumerable<Customer> GetAllCustomers();
}

The previous thousand places now look something like:

private IRepository Repository { get; } // Initialized e.g. from constructor
private IEnumerable GetCustomers()
{
  // Use a repository hidden behind an interface.
  // We don't know in this place (business logic) how the interface is implemented
  // and what classes it uses. When the implementation changes from Entity Framework to something else,, no changes have to be made here (loose coupling).    
  return this.Repository.GetAllCustomers();
}

The implementation of IRepository:

class EntityFrameworkRepository : IRepository
{

  IEnumerable<Customer> GetAllCustomers()
  {
    // If we want to drop Entity Framework, we just have to provide a new implementation of IRepository
    return DbContext.Customers;
  }

  ...
}

Now, you decide to use plain SQL. The only change to make is to implement a new IRepository, instead of changing thousand places to remove the specialized code:

class MySqlRepository : IRepository
{
  // The caller still accesses this method via the interface IRepository.GetAllCustomers()
  IEnumerable<Customer> GetAllCustomers()
  {
    return this.Connection.ExecuteQuery("SELECT * FROM ...");
  }

  ...
}

Now you decide to replace MySQL with Microsoft SQL. All you have to do is to implement a new IRepository.
You can swap in and out any database and change the query language or introduce helper frameworks without affecting your original business logic. Once written, never touched again (at least for the changes regarding the database).
If you move the implementation to a separate assembly, you can even swap them at runtime.