I'm having an issue in Microsoft Access where my query calculates extremely slow (it takes hours and hours). This query is reading a table that has 150,000 records and each record belongs to one of 4,000 unique groups (called API_10).
The goal of the query is to calculate a running cumulative production value (organized by API_10 and Date) such that the running cumulative production starts over at each new API_10 group. Each record in the table has a field called No which is an autonumber that MS Access calculates so that the table has a Primary Key. An example of what I'm describing is shown below:
MyTable:
No API_10 Date Production
1 1 1/1/2010 1000
2 1 2/1/2010 500
3 2 7/1/2014 300
4 2 8/1/2014 400
MyQuery:
No API_10 Date Production Cumulative_Production
1 1 11/1/2010 1000 1000
2 1 12/1/2010 500 1500
3 2 27/1/2014 300 300
4 2 28/1/2014 400 700
Here is a sample of the code (typed in the Expression Builder on MS Access) used to create the Cumulative_Production column in MyQuery:
Cumulative_Production:
DSum("[Production]","[MyTable]","[API_10]='" & [API_10] & "' AND [No]<=" & [No])
Do note that this is a simplified version of the actual query/table. The real query also computes another field called Normalized_Prod_Month which counts the number of production dates (starting at 1) for each unique API_10 as shown below:
NORMALIZED_PROD_MONTH:
DCount("[Date]","[MyTable]","[API_10]='" & [API_10] & "' AND [No]<=" & [No])
Any tips for improving these types of calculations would greatly help!!
If you apply this query to each record, then you must access n * (n + 1) / 2
records. If all 4000 groups have about the same size of 38 records, you get 4000 * 38 * (38 + 1) / 2 = ~ 3 Mio
accesses. But this is the best case, since larger groups have an over-proportional cost because of the quadratic nature of n * (n + 1) / 2
.
You are better off by creating the running sum in a loop in VBA, and accessing each record only once.
Dim db As DAO.Database, rs As DAO.Recordset
Dim lastNoApi As Long, runningSum As Long
Set db = CurrentDb
Set rs = db.OpenRecordset("SELECT * FROM MyTable ORDER BY NoAPI_10, Date")
Do Until rs.EOF
If rs!NoAPI_10 <> lastNoApi Then
runningSum = 0
lastNoApi = rs!NoAPI_10
End If
runningSum = runningSum + rs!Production
'TODO: insert the result into a temporary table
rs.MoveNext
Loop
rs.Close: Set rs = Nothing
db.Close: Set db = Nothing
Or use the following query. It still has a quadratic cost, but a single query is always more performing than multiple calls to DCount
, DSum
or DLookup
.
SELECT
A.API_10,
A.Date,
A.Production,
(Select Sum(B.Production)
FROM MyTable B
WHERE B.API_10 = A.API_10 And B.[No] <= A.[No]) AS Cumulative_Production
FROM MyTable AS A
ORDER BY A.API_10, A.Date;
Assuming that the No
column is consistent with the date sequence. If the dates are unique, you can also replace B.[No] <= A.[No]
with B.[Date] <= A.[Date]
.