Search code examples
sqlexcelodbcingresvba

Select a maximum value within a date range


Task:

  • Append/edit the currently working code below to return only one row per patient, the maximum value of d1_10.xtransfer (datatype int) with the restriction that this row's d1_10.dstartdate <= glob_End_Date.

Caveats:

  • There are similar questions on StackOverflow and its sister sites. None that I have found have successfully helped with a resolution to this issue.

  • This is a medical EHR database, I can share code, but any discussion of results has to be general and exclude patient information.

  • I am replacing the SQL query within a pre-existing Excel spreadsheet to do something different. Excel pulls information from our database with an ODBC connection. Our database is using Ingres SQL which accepts most, but not all, of your typical SQL code varieties. It's possible that a piece of code will generally work in other flavors of SQL but not with the combo of Ingres and Excel. I've got the spreadsheet working and returning results, now it's about making some fixes by writing SQL code that works in this software.

Thus far:

With the currently working code below (no maximum d1_10.xtransfer restrictions) we return all rows with d1_10.dstartdate in the user selected date range and with the user selected d1_10.xinstitute. We want just the latest one. That is, the patient's row with either the maximum d1_10.dstartdate within the date range, or the maximum d1_10.xtransfer (index that counts up as they are added) within the date range.

Currently working code:

    "SELECT " & _
    "d1.xpid ""XPID"", " & _
    "d0_v1.name_family ""NAME_FAMILY"", " & _
    "d0_v1.name_given1 ""NAME_GIVEN1"", " & _
    "d0_v1.name_given2 ""NAME_GIVEN2"", " & _
    "d1.sex ""SEX"", " & _
    "d1.birthdate ""DOB"", " & _
    "d0_v1.hsp_pid, " & _
    "c58.brief_name, " & _
    "c73.cname, " & _
    "date_trunc('day',d1_10.dstartdate) ""DSTARTDATE"", " & _
    "date_trunc('day',d1_17.ddeath) ""DDEATH"" " & _
    "FROM d1 " & _
    "JOIN d0_v1 ON d1.xpid = d0_v1.xpid " & _
    "JOIN d1_2 ON d1.xpid = d1_2.xpid  " & _
    "JOIN c58 ON d1_2.xmodality = c58.xcmodality " & _
    "JOIN d1_10 ON d1.xpid = d1_10.xpid " & _
    "JOIN c73 ON d1_10.xinstitute = c73.xcsite " & _
    "JOIN d1_17 ON d1.xpid = d1_17.xpid " & _
    "WHERE " & _
    "d1_10.xinstitute = " & institute_index & " AND " & _
    "d1_10.dstartdate >= '" & glob_Start_Date & " 00:00:00' and " & _
    "d1_10.dstartdate <= '" & glob_End_Date & " 23:59:59' "

The closest I have gotten with code that runs from the excel spreadsheet is with this additional line in the WHERE clause:

d1_10.xtransfer = (SELECT MAX(d1_10.xtransfer) FROM d1_10 GROUP BY xpid)

With this additional line we now return only one row from each patient that has a d1_10.xtransfer within the date range. But if they have a row where d1_10.xtransfer is more recent than the date range, then they don't show up in the results at all.

With this line the code is taking MAX(d1_10.xtransfer) for each xpid before it applies the date restriction. By my logic we want it to do so after instead, but I have been unable to come up with code that runs that gets it any closer than this.

Thanks in advance. I'll keep this question updated with additional info below this page break.


Additional Info:

  • Per PaulM:

Yes, xpid is a patient ID index number, unique to each patient.

Added/edited line in WHERE clause to: "d1_10.xtransfer = (SELECT MAX(xtransfer) FROM d1_10 d1_10_b WHERE d1_10.xpid = d1_10_b.xpid AND d1_10_b.dstartdate <= '" & glob_End_Date & " 23:59:59') "

Patient Bob has transfers on both the 14th and 17th of June that fit the rest of the criteria.

When inputting a date range with an end date of Jun 17+, the spreadsheet correctly returns a row for Bob with his Jun 17 transfer.

When inputting a date range with an end date of Jun 14,15 or 16, the spreadsheet incorrectly does not return a row for Bob.

It seems as though it still takes the maximum xtransfer before restricting by date.

  • Per PaulM's comment:

I ran the subselect for a specific patient as follows:

Input:

SELECT MAX(xtransfer) FROM d1_10 d1_10_b WHERE d1_10_b.xpid = '2258' AND d1_10_b.dstartdate <= '20-apr-2016 23:59:59'

It outputted a value of MAX(xtransfer) = '48233'. This is correct.

So, when run in Visual SQL as its own statement, setting d1_10_b.xpid equal to a specific patient, it correctly pulls the maximum xtransfer from the date range. (There was a more recent xtransfer outside of the date range, and it still correctly displayed the maximum xtransfer from within the date range.)

I then tried running this exact same subselect in the where close for the spreadsheet. That is, I manually selected the same date range (which is being passed through as a variable correctly and successfully) but I subbed out d1_10.xpid = d1_10_b.xpid for d1_10_b.xpid = '2258'. This did not work. The spreadsheet did not show a row for this patient, seemingly because it still applies the MAX() function before it restricts by the date range in the subquery. And yet, the subquery works when run by itself.

Much appreciation for any further suggestions.


Solution

  • You need to add the date restriction in the subselect as well as the main query. Also I suspect the group by is wrong. By adding a group by you're making the subselect a list of patients xtransfer values with the largest value for each xpid (identifies a patient?). However that means if the row you're interested in from the main query happens to have an xtransfer value that matches the largest one belonging to a different xpid you're getting a false match.

    What you really need is to add a join on xpid from the subselect back up to the main query. To do that you'll need a different correlation name e.g.

    d1_10.xtransfer = (SELECT MAX(xtransfer) 
                       FROM d1_10 d1_10_b 
                       WHERE d1_10.xpid = d1_10_b.xpid 
                       AND d1_10_b.dstartdate > = ... {as above} )