Search code examples
sqlsql-serversql-server-2000

Extracting data based on data in multiple columns and rows


I have the following data which represents a call flow coming into our centre and being transferred to a consultant (CSO).

CallID   Sequence    Action          Location         Input    NextLocation

1135     0           CallStart       NULL             NULL     NULL    
1135     1           MenuStart       EFTPosHelpDesk   NULL     NULL    
1135     2           KeyPress        EFTPosHelpDesk   3        TransferCSO
1135     3           TransferEntry   EFTPosHelpDesk   NULL     NULL
1135     4           TransferFlag    NULL             NULL     NULL
1135     5           AccessNum       NULL             NULL     NULL
1135     6           Transfer        NULL             NULL     NULL
1135     7           Hangup          NULL             NULL     NULL

I'm using MS SQL Management Studio 2005 to extract the data, however the data itself is stored on a MS SQL 2000 server.

There are literally millions of calls recorded in the database and I need to extract the CallID's where a customer was transferred to a CSO. The target data that tells me a call was transferred to a CSO is exactly as follows:

@Sequence 2 -> NextLocation = 'TansferCSO'
@Sequence 3 -> Action = 'TransferEntry'
@Sequence 4 -> Action = 'TransferFlag'
@Sequence 5 -> Action = 'AccessNum'
@Sequence 6 -> Action = 'Transfer'
@Sequence 7 -> Action = 'Hangup'

... and it's always in that sequence, but the "Sequence" numbering and "Location" will vary as some calls can run for 50 - 70 steps on average and we have 100's of IVRs (Locations).

I'm fairly new to SQL and I've tried using FETCH and IF / ELSE but without success. ROW_NUMBER() wont work due to the data being on a MS SQL2000 server.

Any examples or advice would be greatly appreciated.


Solution

  • This will give you all the calls (CallID's) that have been transferred to CSO based on all 6 of the entries being present (in any sequence).

      select CallID
        from yourTable
       where Action in ('TransferEntry', 'TransferFlag',
                        'AccessNum', 'Transfer', 'Hangup')
          or NextLocation = 'TransferCSO'
    group by CallID
      having count(distinct Action) = 6;
    

    Note: The assumption is that the row with NextLocation='TransferCSO' has a different but specific action, e.g. "KeyPress"


    If you absolutely need to ensure that the 6 events are consecutive, you can use the below. A minor variation to the JOIN conditions at the end can also be used for in-order without being necessarily consecutive.

    create table #tmpCalls (CallID int, Sequence int, Action varchar(20));
    
    select c.CallID, c.Sequence, c.Action
      from
    (
          select CallID
            from yourTable
           where Action in ('TransferEntry', 'TransferFlag',
                            'AccessNum', 'Transfer', 'Hangup')
              or (Action = 'KeyPress' and NextLocation = 'TransferCSO')
        group by CallID
          having count(distinct Action) = 6
    ) a
      join yourTable c on c.CallID = a.CallID
     where c.Action in ('TransferEntry', 'TransferFlag',
                        'AccessNum', 'Transfer', 'Hangup')
        or (c.Action = 'KeyPress' and c.NextLocation = 'TransferCSO');
    
    create clustered index #ix_tmpCalls on #tmpCalls(CallID, Sequence, Action);
    
    select distinct a.CallID
      from #tmpCalls a                 -- or perhaps just:  b.Sequence > a.Sequence
      join #tmpCalls b on b.Action = 'TransferEntry' and b.Sequence = a.Sequence + 1
      join #tmpCalls c on c.Action = 'TransferFlag' and c.Sequence = b.Sequence + 1
      join #tmpCalls d on d.Action = 'AccessNum' and d.Sequence = c.Sequence + 1
      join #tmpCalls e on e.Action = 'Transfer' and e.Sequence = d.Sequence + 1
      join #tmpCalls f on f.Action = 'Hangup' and f.Sequence = e.Sequence + 1
     where a.Action = 'KeyPress' and a.NextLocation = 'TransferCSO';
    

    Note that the subquery is the original query to narrow down the candidates. The temporary table is used to make it perform quickly since we can cluster it around the 3 columns.