Search code examples
sqlms-access

Slow MS Access Query (Using DSum & DCount Functions)


I'm having an issue in Microsoft Access where my query calculates extremely slow (it takes hours and hours). This query is reading a table that has 150,000 records and each record belongs to one of 4,000 unique groups (called API_10).

The goal of the query is to calculate a running cumulative production value (organized by API_10 and Date) such that the running cumulative production starts over at each new API_10 group. Each record in the table has a field called No which is an autonumber that MS Access calculates so that the table has a Primary Key. An example of what I'm describing is shown below:

MyTable:

No     API_10       Date           Production
1      1            1/1/2010       1000
2      1            2/1/2010       500
3      2            7/1/2014       300
4      2            8/1/2014       400  

MyQuery:

No       API_10         Date              Production   Cumulative_Production
1        1              11/1/2010         1000         1000
2        1              12/1/2010         500          1500
3        2              27/1/2014         300          300
4        2              28/1/2014         400          700

Here is a sample of the code (typed in the Expression Builder on MS Access) used to create the Cumulative_Production column in MyQuery:

Cumulative_Production:

DSum("[Production]","[MyTable]","[API_10]='" & [API_10] & "' AND [No]<=" & [No])

Do note that this is a simplified version of the actual query/table. The real query also computes another field called Normalized_Prod_Month which counts the number of production dates (starting at 1) for each unique API_10 as shown below:

NORMALIZED_PROD_MONTH:

DCount("[Date]","[MyTable]","[API_10]='" & [API_10] & "' AND [No]<=" & [No])

Any tips for improving these types of calculations would greatly help!!


Solution

  • If you apply this query to each record, then you must access n * (n + 1) / 2 records. If all 4000 groups have about the same size of 38 records, you get 4000 * 38 * (38 + 1) / 2 = ~ 3 Mio accesses. But this is the best case, since larger groups have an over-proportional cost because of the quadratic nature of n * (n + 1) / 2.

    You are better off by creating the running sum in a loop in VBA, and accessing each record only once.

    Dim db As DAO.Database, rs As DAO.Recordset
    Dim lastNoApi As Long, runningSum As Long
    
    Set db = CurrentDb
    Set rs = db.OpenRecordset("SELECT * FROM MyTable ORDER BY NoAPI_10, Date")
    Do Until rs.EOF
        If rs!NoAPI_10 <> lastNoApi Then
           runningSum = 0 
           lastNoApi = rs!NoAPI_10
        End If
        runningSum = runningSum + rs!Production
    
        'TODO: insert the result into a temporary table
    
        rs.MoveNext
    Loop
    rs.Close: Set rs = Nothing
    db.Close: Set db = Nothing
    

    Or use the following query. It still has a quadratic cost, but a single query is always more performing than multiple calls to DCount, DSum or DLookup.

    SELECT
        A.API_10,
        A.Date,
        A.Production,
        (Select Sum(B.Production)
         FROM MyTable B
         WHERE B.API_10 = A.API_10 And B.[No] <= A.[No]) AS Cumulative_Production
    FROM MyTable AS A
    ORDER BY A.API_10, A.Date;
    

    Assuming that the No column is consistent with the date sequence. If the dates are unique, you can also replace B.[No] <= A.[No] with B.[Date] <= A.[Date].