Can someone please tell me a 'general use', step-by-step method to find out what are the non-aggregability fields in a datamart. Here an example I found:
Notes: italic means 'key', bold identifies 'shortenings', 'column' is an alias for 'referencing'
Relational schema:
CALL(COD,DATE,FROM:S,TO:S,LEN)
SIM(SIM, USER:USER, TRIFF:T, BONUS)
TARIFF(TARIFF, CARRIER:CAR)
USER(USER, TOWN:TOW, LAST_TARIFF:TAR)
ROAMING_CALL(COD:CAL, FOREIGN_CARRIER:CAR)
PROMO_CALL(COD:CAL, PROMO_TARIFF:P_TA)
PROMO_TARIFF(TARIFF:TAR)
TOWN(TOWN,NATION)
CARRIER(CARRIER, NATION)
REQUESTS: build a fact-schema for 'CALL' with following
dimensions: DATE, SIM_FROM, CALLED_CARRIER, FOREIGN_CARRIER, PROMO_TARIFF and
measures: AVG_CALL_LENGTH, NUM_OUTGOING_SIM (as count distinct FROM), NUM_INCOMING_SIM (as count distinct TO)
Now I can draw the fact-schema, but I'm in trouble finding what measures are aggregable along which dimensions
EDIT: this is the pdf for the fact schema I have (sorry for not using a strict sintax but reading notes are included)
Measures:
Standard [obtained by the operational schema]:
NUM_INCOMING_CALLS = COUNT DISTINCT (TO)
UN-AGGREGABILITIES ->*THIS IS MY ISSUE*
Calculated [obtained by the operational schema, need partial data to add properly]:
AVG_CALL_LENGTH = CL_SUM/CL_COUNT
where
CL_SUM = SUM (LENGTH), CL_COUNT = COUNT(LENGTH)
UN-AGGREGABILITIES ->*THIS IS MY ISSUE*
Derived [can be found as a dimension]:
NUM_OUTGOING_CALLS = COUNT DISTINCT ( FROM )
UN-AGGREGABILITIES ->*THIS IS MY ISSUE*
Ok, I went and asked my teacher: he gave me a easy algorithm:
Given a schema D{D1, D2, D3, ... Dn}, for a Mesaure M= count distinct A n
if A U X -> Di is not trivial, X subset of D
X U A -> D1 (True)
X U A -> D2 (False)
X U A -> D3 (True)
...
X U A -> Dn-1 (False)I have that NA = {D2, Dn-1}
NA: set of non-aggregabilities