Search code examples
data-warehouseerpscd

Datawarehouse Number of records timeline


I don't have historical information on sales order in my ERP database. I have a table containing sales order. Once a sales order is approved, it is transformed in an invoice and the record sales order is deleted.

I can code in the ERP a historical table with only information I need. I thought that is something that should be solved by a data warehouse (also due to personal interest).

I want to see how much sales order I have by date. For example, I want to see that I have an average of X orders of a total value of XXX by month.

My thought where to just add columns "Valid from" and "Until to". From there I'm blocked, my initial though where to use SCD but as I understand it is not possible to do that on the primary key.

Shall I need to create an new fact table (like I would do in the ERP database) ? I don't want to program something also on the insert trigger or delete trigger from the sales order table. We clean the sales order table at the end of the day, it needs to be updated when we update the data warehouse in the night. For this case, I don't need to take into consideration that the order would not be approved and that I would store non-relevant records.


Solution

  • For what I can gather, you suffer from a bad data model.

    Case

    There is a sales order table. Records in this table are not preserved (with a changed status for example or a simple copy into a table "ordered") but deleted.

    This is the first mistake.

    Secondly, you say you don't want to listen on the delete trigger. Unfortunately, as far as I understand it, a delete can happen every time of the day. So a job collecting data from the sales order table would have to run every minute at least if the approval can happen very fast. This is also a questionable design decision.

    Approach

    Personally, I would not start with a DWH. I would start with a simple job which would fill new sales data when the insert trigger comes up. This because the delta check will become costly over time.

    And then this table, if it holds all information, can be used for data analysis.

    Questions

    But also you need to identify the process when orders are cancelled. How would you know that those orders are deleted because they are cancelled and not because they were approved?

    When you do have a more clear concept in the steps of the process, it will become easier to use the data in e.g. a DWH. But even the DWH won't help if the processes beforehand are not well-defined for at least the status changes.