Search code examples
sql-serverlarge-data-volumes

SQL table with many specific datetimes. How do I make this more scalable?


I have a rather troublesome table that looks like this:

EventTimeLog

Id (bigint) | Time (datetime) | LogId (FK to Log tables Id column)

This table outlines the times a Log event occurred. This way to only write one unique log event and times those logs occur can be derived from this table.

Problem is the dates are so specific and these dates can have duplicate values. Ex:

2015-08-03 23:54:58.000 | 1983
2015-08-03 23:54:58.000 | 1934
2015-08-03 23:54:56.000 | 1647

After some time it becomes very difficult to query. Usually around 500k rows or so it's starts to chug, even if I put an index on LogId and Time. By the time I hit the 1mill range and up, queries slow to a crawl...

I do need these specific times so aggregating by the start of the hour or day is not an option. I also doubt implementing a Count columns for duplicate time values will help much as these dates are so granular. The index will still have to jump through all those specific dates which is very slow.

I'm unsure how to make this table more scale-able. Maybe break this out into monthly tables?

As requested, here is the query used that starts chugging

SELECT b.User, b.Ip, b.AccountId, a.Time FROM 
EventTimeLog a 
inner join [Log] b on a.LogId = b.Id
WHERE 
b.UserId = '<param>' AND
a.Time >= '<param>' AND
a.Time <= '<pamam>' 

If the time difference is > 2 days it chugs like crazy. And yes, I have indexes on Log for UserId.


Solution

  • There are many times in your table that differ only slightly. This would make an index huge and not really helpful.

    So consider a computed column instead giving you a less precise time. Then create an index for logid + this computed column.

    I don't know what time spans you usually query. Let's use hours for an example. You'd truncate your date to the hour (e.g. dateadd(hour, datediff(hour, 0, time), 0) or as a string: substring(convert(varchar(25), time, 120), 1, 13)):

    The new column (I use a string here):

    alter table mytable add comp_hour as substring(convert(varchar(25), time, 120), 1, 13);
    

    So you'd get for instance:

    time                            comp_hour
    2015-09-03 14:12:10.2158145     '2015-09-03 14'
    2015-09-03 14:45:27.4457813     '2015-09-03 14'
    

    The index:

    create index index_comp_hour on mytable(logid, comp_hour);
    

    The query:

    select l.user, l.ip, l.accountid, e.time 
    from log l
    join eventtimelog e on e.logid = l.id and e.comp_hour in ('2015-09-03 13', '2015-09-03 14')
    where l.userid = 123;
    

    (I am not sure though, whether the index mytable(logid, comp_hour) or mytable(comp_hour, logid) would be better, or if it even matters. You can just create both and then look at the execution plan and remove the one that is not used.)