Search code examples
databasedatabase-designvisitor-statistic

Suggested table scheme for saving visitors by week days and hours statistics


I want to save in DB the number of times each of out clients enter the website by week day and hour. Meaning that for each one of the clients I will have 24 * 7 values that will be constantly updating to reflect the peak hour with the most visits for the client. I've seen the obvious suggestion Database structure for holding statistics by day, week, month, year to create a new line for each entrance and than use the data, it won't work, we have millions of lines and I need the peak hour for each client to be available. Also, creating 168 columns for each clients looks a little extreme. Any suggestions?


Solution

  • Here's a table structure (similar to one which I've seen implemented) which separates out the summary stats into week, date (or day), and hour tables with conjoining primary key to foreign key relationships. Instead of storing the different hours of the day as columns (which is not recommended in rdbms) it stores them in rows. Millions of visits per day (or hour) could be handled with appropriate indexes and partitioning as needed.

    Something like this

    DDL

    create table dbo.visitor_events(
        v_id       int identity(1,1) primary key not null,
        client_id   int not null references clients(client_id),
        visit_dt    datetime2(7) not null default sysutcdatetime());
    
    create table dbo.visitor_event_weeks(
        vsw_id       int identity(1,1) primary key not null,
        client_id   int not null references clients(client_id),
        visit_wk    int not null,
        visits      int not null);
    
    create table dbo.visitor_event_dates(
        vsd_id      int identity(1,1) primary key not null,
        client_id   int not null references clients(client_id),
        vsw_id      int not null references visitor_event_weeks(vsw_id),
        visit_wk    int not null,
        visit_dt    datetime not null,
        visits      int not null);
    
    create table dbo.visitor_event_hours(
        vsh_id       int identity(1,1) primary key not null,
        client_id   int not null references clients(client_id),
        vsd_id      int not null references visitor_event_dates(vsd_id),
        visit_hr    datetime not null,
        visits      int not null);
    

    var

    Variables and insert/update statement (would vary depending on what's best for the OP)

    declare 
      @client_id          int=123,
      @visit_dt           datetime2(7)=sysutcdatetime();
    declare
      @v_id               int;
    declare   
      @vsw                table(vsw_id      int unique not null);
    declare   
      @vsd                table(vse_id      int unique not null);
    
    /* Insert a visit */
    insert dbo.visitor_events(client_id, visit_dt) values
    (@client_id, @visit_dt);
    select @v_id=scope_identity();
    
    /* Insert/update a visit week */
    update dbo.visitor_event_weeks
    set visits=visits+1
    output inserted.vsw_id into @vsw
    where client_id=@client_id
          and visit_wk=datediff(wk, 0, @visit_dt);
    if @@rowcount>0
        begin
            insert dbo.visitor_event_weeks(client_id, visit_wk, visits) 
            output inserted.vsw_id into @vsw
            values (@client_id, datediff(wk, 0, @visit_dt), 1);
        end
    
    /* Insert a visit date */
    update dbo.visitor_event_dates
    set visits=visits+1
    output inserted.vsd_id into @vsd
    where client_id=@client_id
          and vsw_id=(select top 1 vsw_id from @vsw)
          and visit_dt=cast(@visit_dt as date);
    if @@rowcount>0
        begin
            insert dbo.visitor_event_dates(client_id, vsw_id, visit_dt, visits) 
            output inserted.vsd_id into @vsd
            values (@client_id, (select top 1 vsw_id from @vsw), cast(@visit_dt as date), 1);
        end
    
    /* Insert a visit date hour */
    update dbo.visitor_event_dates
    set visits=visits+1
    output inserted.vsd_id into @vsd
    where client_id=@client_id
          and visit_dt=cast(@visit_dt as date);
    if @@rowcount>0
        begin
            insert dbo.visitor_event_hours(client_id, vsd_id, visit_dt, visits) 
            output inserted.vsd_id into @vsd
            values (@client_id, (select top 1 vsw_id from @vsw), cast(@visit_dt as date), 1);
        end