Search code examples
sqlpostgresqlscalasqueryl

Finding the current status of each phone number in PostgreSQL


Terminology: msisdn = phone number

First I'd like to apologize for the names. This Database schema was created using the squeryl ORM and it has some interesting table definition choices. I've included the two relevant tables below.

Basically, Records contains provisioning requests. Provisioning is each attempt at that Record. Some Records are attempted multiple times.

create table "Provisioning" (
"record_id" bigint not null,
"responseCode" varchar(128),
"response" varchar(128),
"id" bigint primary key not null,
"status" integer not null,
"actionRequired" boolean not null
);
create sequence "s_Provisioning_id";
create table "Record" (
"source" varchar(128) not null,
"timestamp" varchar(128) not null,
"finalState" varchar(128) not null,
"fromMSISDN" varchar(128) not null,
"reason" varchar(128) not null,
"id" bigint primary key not null,
"toICCID" varchar(128) not null,
"fromICCID" varchar(128) not null,
"toMSISDN" varchar(128) not null,
"batch_id" bigint not null,
"action" varchar(128) not null,
"transactionId" varchar(128) not null
);

I realize the Provisioning has no timestamp. Yes the latest id is the latest request. The developer working on it forgot to put in timestamps, the project manager convinced them it wasn't a part of the original requirements and then the client didn't want to pay to add it in later. No I'm not happy about it. No there's nothing I can do about it. Yes I hate working for a consulting company. Moving on.

The Problem: I need a report that gives me the latest state of each phone number (msisdn). There can be multiple records for each phone number. In the case of from/toMSISDN, the toMSISDN should always be used unless it is empty, in which case the from is used. The following query gets all the unique phone numbers that are in Record:

SELECT 
CASE
  WHEN "toMSISDN" = '' OR "toMSISDN" IS NULL THEN "fromMSISDN"
  ELSE "toMSISDN"
END AS msisdn
FROM "Record"
GROUP BY msisdn

So that gives me a subset of all the numbers I need to report on. Now I need the very latest Record and Provisioing. I can get the latest Provisiong with the following:

 SELECT
 max(p.id) latest_provision_id,
 p.record_id
 FROM "Provisioning" p
 LEFT JOIN "Record" r ON r.id = p.record_id
 group by p.record_id

So this gives me a 1-to-1 table of each record and what it's latest provisioning is. And this is where I start to get stuck. From the Provisioning table I need the response and responseCode for the latest Provisioing. I thought about just adding max(p."responseCode") to the query, but then I realized it would most likely do a alphabetic comparison and not get the correct responseCode/response for the appropriate Provisioning.id. I tried adding those fields to the Group By, but then I started getting extra records in my queries and I wasn't quite sure what was going on.

This (very ugly subquery join) seems to give me the correct record row and provisioning row information, but it's for ever record and I need to get the latest (max provisioning id) for each msisdn/phone number (computed field). I'm not sure what to group by and what aggregate functions to use.

SELECT *,
 CASE
   WHEN "toMSISDN" = '' OR "toMSISDN" IS NULL THEN "fromMSISDN"
   ELSE "toMSISDN"
 END AS msisdn 
FROM (
 SELECT
 max(p.id) latest_provision_id,
 p.record_id
 FROM "Provisioning" p
 LEFT JOIN "Record" r ON r.id = p.record_id
 group by p.record_id
) latest_prov
 LEFT JOIN "Provisioning" p2 ON p2.id=latest_prov.latest_provision_id
 LEFT JOIN "Record" r2 ON r2.id=latest_prov.record_id

I can't seem to think of a clean way of doing this without running multiple queries and dealing with the results in the application layer.

I was also originally going to do this as a Scala app using the same squeryl ORM, but the query just got considerably more complicated and I stopped around the following statement, opting instead to do the reports as a Python application:

def getSimSnapshot()  =  {
  join(record,provisioning.leftOuter)((r,p) =>
    groupBy(r.fromMSISDN)
    compute(max(r.id),r.finalState,r.fromMSISDN,r.reason,r.action)
    on(r.id === p.map(_.record_id))
  )
}

If there's an easier way to do this with the ORM I'm all for it.


Solution

  • Check out Window Functions: http://www.postgresql.org/docs/8.4/static/tutorial-window.html

    Without joins you can get latest data for record_id:

    select * from ( select p.record_id, p.responseCode, p.id, max(p.id) over (partition by p.record_id) max_id from "Provisioning" p ) where id = max_id

    It can be a problem only if "Provisioning" contains also record_ids of different tables then "Records"