Terminology: msisdn = phone number
First I'd like to apologize for the names. This Database schema was created using the squeryl ORM and it has some interesting table definition choices. I've included the two relevant tables below.
Basically, Records contains provisioning requests. Provisioning is each attempt at that Record. Some Records are attempted multiple times.
create table "Provisioning" (
"record_id" bigint not null,
"responseCode" varchar(128),
"response" varchar(128),
"id" bigint primary key not null,
"status" integer not null,
"actionRequired" boolean not null
);
create sequence "s_Provisioning_id";
create table "Record" (
"source" varchar(128) not null,
"timestamp" varchar(128) not null,
"finalState" varchar(128) not null,
"fromMSISDN" varchar(128) not null,
"reason" varchar(128) not null,
"id" bigint primary key not null,
"toICCID" varchar(128) not null,
"fromICCID" varchar(128) not null,
"toMSISDN" varchar(128) not null,
"batch_id" bigint not null,
"action" varchar(128) not null,
"transactionId" varchar(128) not null
);
I realize the Provisioning has no timestamp. Yes the latest id is the latest request. The developer working on it forgot to put in timestamps, the project manager convinced them it wasn't a part of the original requirements and then the client didn't want to pay to add it in later. No I'm not happy about it. No there's nothing I can do about it. Yes I hate working for a consulting company. Moving on.
The Problem: I need a report that gives me the latest state of each phone number (msisdn). There can be multiple records for each phone number. In the case of from/toMSISDN, the toMSISDN should always be used unless it is empty, in which case the from is used. The following query gets all the unique phone numbers that are in Record:
SELECT
CASE
WHEN "toMSISDN" = '' OR "toMSISDN" IS NULL THEN "fromMSISDN"
ELSE "toMSISDN"
END AS msisdn
FROM "Record"
GROUP BY msisdn
So that gives me a subset of all the numbers I need to report on. Now I need the very latest Record and Provisioing. I can get the latest Provisiong with the following:
SELECT
max(p.id) latest_provision_id,
p.record_id
FROM "Provisioning" p
LEFT JOIN "Record" r ON r.id = p.record_id
group by p.record_id
So this gives me a 1-to-1 table of each record and what it's latest provisioning is. And this is where I start to get stuck. From the Provisioning table I need the response and responseCode for the latest Provisioing. I thought about just adding max(p."responseCode") to the query, but then I realized it would most likely do a alphabetic comparison and not get the correct responseCode/response for the appropriate Provisioning.id. I tried adding those fields to the Group By, but then I started getting extra records in my queries and I wasn't quite sure what was going on.
This (very ugly subquery join) seems to give me the correct record row and provisioning row information, but it's for ever record and I need to get the latest (max provisioning id) for each msisdn/phone number (computed field). I'm not sure what to group by and what aggregate functions to use.
SELECT *,
CASE
WHEN "toMSISDN" = '' OR "toMSISDN" IS NULL THEN "fromMSISDN"
ELSE "toMSISDN"
END AS msisdn
FROM (
SELECT
max(p.id) latest_provision_id,
p.record_id
FROM "Provisioning" p
LEFT JOIN "Record" r ON r.id = p.record_id
group by p.record_id
) latest_prov
LEFT JOIN "Provisioning" p2 ON p2.id=latest_prov.latest_provision_id
LEFT JOIN "Record" r2 ON r2.id=latest_prov.record_id
I can't seem to think of a clean way of doing this without running multiple queries and dealing with the results in the application layer.
I was also originally going to do this as a Scala app using the same squeryl ORM, but the query just got considerably more complicated and I stopped around the following statement, opting instead to do the reports as a Python application:
def getSimSnapshot() = {
join(record,provisioning.leftOuter)((r,p) =>
groupBy(r.fromMSISDN)
compute(max(r.id),r.finalState,r.fromMSISDN,r.reason,r.action)
on(r.id === p.map(_.record_id))
)
}
If there's an easier way to do this with the ORM I'm all for it.
Check out Window Functions: http://www.postgresql.org/docs/8.4/static/tutorial-window.html
Without joins you can get latest data for record_id:
select *
from
(
select p.record_id, p.responseCode, p.id, max(p.id) over (partition by p.record_id) max_id
from "Provisioning" p
)
where id = max_id
It can be a problem only if "Provisioning" contains also record_ids of different tables then "Records"