Search code examples
sql-servercoldfusionzingchartcfchartcoldfusion-2016

Make chart gaps in ZingChart when missing dates in dynamically loaded data?


I have been using ColdFusion 2016 and ZingCharts (bundled) to dynamically create charts using SQL Server, with a time series on the X axis. When there are time gaps I would like the line chart to also show a gap, but instead the line is continuous and plots each datapoint consecutively.

A pic of the chart the way it is plotting now, you can see there is no 'gap' between the Oct 29 and March dates, the data just run together: NoDataGap

My data are generally in 15min increments, but there are stretches of time (days or months) where there are gaps in the timeseries and data. I contacted ZingCharts to ask if there was some kind of style tag that controls whether the dates are displayed consecutively or with gaps and there is not. It's something that has to be manipulated at the data-level. If my data were hardcoded I would have to add null values so that the charts would plot with gaps in the timeseries, but my charts are dynamic (a user can choose any number of 7 parameters to add to the chart for a date range they choose). I have found information on how to solve this for hardcoded data, but I'm looking for ideas for solutions for dynamically loaded data/series. I have also found information on a deprecated coldfusion tag for the XML file, isInterpolated="false", but that's no longer an option.

My question is what is the best way to solve this? I found some information about creating a calendar table in SQL Server and unioning that with the table(s) providing the data so that all datetimes would be filled. I was wondering if there's another approach that I'm not thinking of? Thanks for any help, I'm very new at all of this.


Update: Here is the current query for the data, which is a bit complicated. It pulls "Nth" rows based on how many parameters (7 available) are selected and how many days are in the date range:

SELECT
distinct 
 datepart(year, t.sample_date) as [year]
,datepart(month, t.sample_date) as [month]
,datepart(day, t.sample_date) as [day]
,datepart(hour, t.sample_time) as [hr]
,datepart(minute, t.sample_time) as [min]  
,convert(varchar(10), t.sample_date, 1) + ' ' + 
  RIGHT('0' + CONVERT([varchar](2), DATEPART(HOUR, t.sample_time)), 2) + ':' +
  RIGHT('0' + CONVERT([varchar](2), DATEPART(MINUTE, t.sample_time)), 2) AS [datetime] 
,t.stationdesc
<cfif isDefined("form.parameter") and ListFindNoCase(form.parameter, "salinity")>,ROUND(t.salinity,1) as salinity</cfif>
<!---plus 6 more parameters--->
FROM (
SELECT    
    [sample_date]
    ,sample_time
    ,stationdesc
    <cfif isDefined("form.parameter") and ListFindNoCase(form.parameter, "salinity") >,salinity</cfif>
    <!---plus 6 more parameters--->
    , row_number() OVER (ORDER BY streamcode) AS rownum
    FROM MyUnionizedTables
    WHERE stationdesc = (<cfqueryparam value="#form.station#" cfsqltype="cf_sql_varchar">)
    AND [sample_date] BETWEEN (<cfqueryparam value='#Form.StartDate#' cfsqltype="cf_sql_date">) 
    AND (<cfqueryparam value='#Form.EndDate#' cfsqltype="cf_sql_date">)
    <cfif isDefined("form.parameter") and ListFindNoCase(form.parameter, "salinity")>and salinity > -25 and salinity <40 and salinity is not NULL  </cfif>
    <!---plus 6 more parameters--->                           
    GROUP BY sample_date, sample_time, stationdesc, streamcode 
    <cfif isDefined("form.parameter") and ListFindNoCase(form.parameter, "salinity")>,salinity</cfif>
    <!---plus 6 more parameters--->
    ) AS t
WHERE    <!---returning Nth row when record sets (count of days between dates selected) are long--->
    <cfif IsDefined("form.station") AND IsDefined("form.parameter") AND #ParamCount# LTE 3 AND form.station eq 'Coastal Bays - Public Landing' and #ctdays# gte 10> t.rownum % 64 = 0 
    <cfelseif IsDefined("form.parameter") AND #ParamCount# LTE 3 AND #ctDays# gte '5840'> t.rownum % 64 = 0 
        <!---plus lots more elseifs--->
    <cfelseif  IsDefined("form.parameter") AND #ParamCount# GTE 7  AND  #ctDays# gte '350'> t.rownum % 8 = 0
    <cfelse>t.rownum % 1 = 0</cfif>
ORDER BY 
     datepart(year, t.sample_date) 
    ,datepart(month, t.sample_date) 
    ,datepart(day, t.sample_date) 
    ,datepart(hour, t.sample_time) 
    ,datepart(minute, t.sample_time) 

SECOND UPDATE (after Leigh's link to query on GitHub):

So I'd actually been working on a similar query to the one Leigh posted based on the "CTE Expression" section here. I switched to trying to work with her version, which is below. I don't have write edits, so I'm working with an existing table. MyDataTable has ~ 21mil rows, with a separate sample_date(datetime) and sample_time(datetime) [the dates and times are a PITA - b/c of the instruments and the way these data are remotely telemetered we get a datetime column with a 'good date' but a bogus timevalue that we call 'sample_date', and then a separate datetime column called 'sample_time' with a bogus date and a 'good time'.] There are 125 stations, each with data (for example, temperature) from different starting and ending dates/times, beginning in 2001 through present. So I need to fill date/time gaps for 125 different stations with differing gaps of time, that are normally in 15min increments.

--- simulate main table(s)
--CREATE TABLE MyDataTable ( sample_date datetime, sample_time datetime, stationdesc nvarchar, wtemp float)

--- generate all dates within this range
DECLARE @startDate datetime
DECLARE @maxDate datetime
SET @startDate = '2015-01-01'
SET @maxDate = '2016-12-31'

--- get MISSING dates
;WITH missingDates AS
(  
    SELECT DATEADD(day,1,@startDate) AS TheDate
    UNION ALL  
    SELECT  DATEADD(day,1, TheDate) 
    FROM    missingDates  
    WHERE   TheDate < @maxDate  
)
SELECT *
      --[wtemp]
   --  ,[stationdesc]
   --  ,[TIMEVALUE]
FROM   missingDates mi LEFT JOIN MyDataTable t ON t.sample_date = mi.TheDate
WHERE  t.sample_date IS NULL
--and stationdesc = 'Back River - Lynch Point'
--ORDER BY timevalue
OPTION  (MAXRECURSION 0)

When I run this query as-is I get only 17 rows of data. TheDate column lists datetimes with dates 12/15-12/31/16 and all times are 00:00:00.000. Query takes 49s.
enter image description here


Meanwhile, my coworker and I have been working on alternate methods.

--Putting data from only 1 station from our big datatable into the new testtable called '_testdatatable'

SELECT        station, sample_date, sample_time, wtemp, streamcode, stationdesc, TIMEVALUE
INTO              _testdatatable
FROM            MyBigDataTable
WHERE        (stationdesc = 'Back River')
order by [sample_date],[sample_time]

--Next, make a new table [_testdatatableGap] with all time values in 15min increments from a datetime table we made
SELECT [wtemp]=null
      ,[streamcode]='ABC1234'
      ,[stationdesc]= 'Back River'
      ,[TIMEVALUE]
      into [tide].[dbo].[_testdatatableGap]
  FROM DateTimeTable
  WHERE  (TIMEVALUE BETWEEN '4/19/2014' AND getdate())

--Then, get the missing dates from the Gap table and put into the testdatatable
INSERT into [_testdatatable]
      (  [wtemp]
        ,[streamcode]
        ,[stationdesc]
        ,[TIMEVALUE] 
)
    (SELECT 
       [wtemp]=null -- needs this for except to work
      ,
      [streamcode]
      ,[stationdesc]
      ,
      [TIMEVALUE] 
  FROM [_testdatatableGap]   
EXCEPT   
SELECT 
       [wtemp]=null -- needs this for except to work
      ,
    [streamcode]
      ,[stationdesc]
      ,
      [TIMEVALUE] 
  FROM [_testdatatable])

This method worked to create a table with all the 15min increments in date/time, which resulted in a correctly drawn chart (below). However, we don't know how to scale this up to the full 125 station full data table without making multiple tables.

CorrectDataGaps


Solution

  • After working through several suggestions, and a lot of research, trial and error I think I’ve solved my problem. I need to work on my additional complication of sometimes needing to reduce the volume of data returned and graphed, but that part is sort of outside the realm of my original question.

    The short version of my answer is:

    1. Made a table view of MyBigDataTable with an additional column which is a datetime column called “TIMEVALUE”.

    2. Made a big permanent datetime calendar table with the datetime column called the same: “TIMEVALUE”.

    3. I then developed a set of SQL queries that

    (a) gather data from MyBigDataTable and put it into a #temptable, and

    (b) also gathers datetimes from the calendar table and puts it into the same #temptable.

    Then, (c) because now there will sometimes be 2 datetime rows, one with data and one with nulls, I run a query to only keep the row with data if there are 2 rows of matching datetime and station. This data can then be charted.

    1. This is all now written dynamically in my .cfm page, station, date range and parameters are chosen by a user and a chart is now successfully drawn with correct ‘gaps’ in the datetimes for times of missing data.

    Here’s SQL (here, limited to only 1 parameter, but I have 8):

    --Step 1. Check if the temptable exists, if it does then delete it
    IF OBJECT_ID('tempdb..#TempTable') IS NOT NULL
    BEGIN
    DROP TABLE #TempTable
    END
    ;
    --Step 2. Create the temptable with data from the parameters, station and dates selected on the .cfm 
    SET NOCOUNT ON
    
    SELECT 
         timevalue
        ,stationdesc
        ,wtemp
    INTO #TempTable
    
    FROM MyBigDataTable
    WHERE 
        stationdesc = 'Station01'
        and [timevalue] BETWEEN '5/29/2014' AND '10/01/2016'
    GROUP BY 
        TIMEVALUE
        ,stationdesc
        ,wtemp
    ;
    --Step 3. Now select datetimes from a big calendar table, and set stationdesc to the selected station, 
    --and rest of parameters to null. And do this for the same selected date range
    INSERT INTO #TempTable
    SELECT 
    [TIMEVALUE] 
    ,[stationdesc]= 'Station01' 
    ,wtemp=null
    FROM MyDatetimeCalendarTable
    WHERE  [timevalue] BETWEEN '5/29/2014' AND '10/01/2016'
    ;
    --Step 4. Run query on the temptable to gather data for chart, but b/c sometimes there will be 2 rows with the same datetime and station but one with data and one with nulls, this query only gathers the row with data if there are 2 rows with matching datetime and station
    SELECT distinct *
    FROM #TempTable a
    WHERE 
    wtemp is not null or
        wtemp is null and 
        not exists(
            SELECT * FROM #TempTable b
            WHERE a.timevalue=b.timevalue 
    and a.stationdesc=b.stationdesc and b.wtemp is not null)
    ORDER BY timevalue
    ;
    

    I need to fully test it and make some amendments, but I think this satisfies the requirements of an answer, because so far it's doing what I need it to do. Thank you to @Leigh and @Dan Bracuk for their wisdom (and patience!)