Search code examples
sql-serverpowerbidata-warehousebusiness-intelligence

Strategy to design specification table for measurements data for analytical database project


I need advice on the following topic:

I am developing a DW/BI solution in SQL Server and reports are published in Power BI.

Main part of my question starts here: I have a large table which collects measurement data on product measurement for multiple attributes. Product can be of multiple type, that can be recognised by item number in this table, measurements can be done multiple times and can be identified by measurement date. Usually, we refer latest dates. If it makes things complicated, I can filter data for latest dates only. This is dense row table (multi million). Number of attribute counts about 200.

I want to include specifications for these attributes most likely in a dimension table, and there may be tens of such specifications. Intention is that user shall select in the report any one specification name and he would like to see each product with attributes passing/failing as well as the products passing if all all of specification attributes are passed.

I currently have this measurement table and a dim table with test names, I can add a table for specification if needed. Specification can define few or all test names with lower/upper spec limits:

Sample measurement table:

enter image description here

Sample dim table for test names:

enter image description here

I can add a table for specification as below and user will select any of one:

enter image description here

e.g. Use select ID_spec = 1 then measurement table may look like:

enter image description here

Some spec may contain all and some few attributes.

Please suggest strategy to design a spec table to be efficient for such large size tables. Please let me know if any further details needed.

Later, I will have to further work to calculate % of pass product if they have been tested for all needed tests in a specification selected.


Solution

  • For large tables, the best thing to do is choose the right key. That means dumping the "Id" column (nothing more than a row identifier) and replacing it with something that:

    • Guarantees uniqueness
    • Facilitates searches

    That often means composite keys, which are fine.

    It's also means dumping the whole "fact/dimension" mindset and just focusing on the relations. This is also fine.

    Based on your description, this is the first draft of a data model for your warehouse. If you are unfamiliar with IDEF1X diagrams, please read this.

    enter image description here

    I've added a unique constraint to SpecCd so you could specify the value directly instead of having to check both the ProductId and SpecCd to return a result.

    ProductTest exists so you can provide integrity for ProductTestCriteria and ensure tests are limited to only those products that can be measured by them. If all products are subject to all tests, this can be removed and Test can relate directly to ProductMeasurement and ProductTestCriteria.

    If you want to subject the latest test of "Product A" to "Spec S" your query would look like:

    SELECT
      Measurement.ProductId
     ,Measurement.TestCd
     ,Measurement.TestDt
     ,Criteria.SpecCd
     ,Measurement.Value
     ,CASE
        WHEN Measurement.Value BETWEEN Criteria.LowerValue AND Criteria.UpperValue THEN 'Pass'
        ELSE 'Fail'
      END AS Result
    FROM
      ProductMeasurement Measurement
    INNER JOIN
      ProductTestCriteria Criteria
        ON Critera.ProductId = Measurement.ProductId
             AND Criteria.TestCd = Measurement.TestCd
    WHERE
      Measurement.ProductId = 'A'
        AND Criteria.SpecCd = 'S'
        AND Measurement.TestDt =
          (
            SELECT
              MAX(TestDt)
            FROM
              ProductMeasurement
            WHERE
              ProductId = Measurement.ProductId
          )
    

    You could remove the filters for ProductId and SpecCd and roll that into a view - users could later specify for the Products and specifications they want later.

    If you want result as of a given date, the query is easily modified to this or incorporated into a TVF:

    SELECT
      Measurement.ProductId
     ,Measurement.TestCd
     ,Measurement.TestDt
     ,Criteria.SpecCd
     ,Measurement.Value
     ,CASE
        WHEN Measurement.Value BETWEEN Criteria.LowerValue AND Criteria.UpperValue THEN 'Pass'
        ELSE 'Fail'
      END AS Result
    FROM
      ProductMeasurement Measurement
    INNER JOIN
      ProductTestCriteria Criteria
        ON Critera.ProductId = Measurement.ProductId
             AND Criteria.TestCd = Measurement.TestCd
    WHERE
      Measurement.ProductId = 'A'
        AND Criteria.SpecCd = 'S'
        AND Measurement.TestDt =
          (
            SELECT
              MAX(TestDt)
            FROM
              ProductMeasurement
            WHERE
              ProductId = Measurement.ProductId
                AND TestDt <= <Your Date>
          )