Search code examples
pentahokettlestar-schema

Hierarchies and levels (Pentaho schema workbench)?


I'm new in BI world and I have a lot of questions. I have to do a BI home work project, so I decided to use:

  • MYSQL (database)
  • Pentaho Kettle (ETL)
  • Pentaho schema workbench (star schema)
  • QlikView (reporting)

I have a dimension table which is SUPERMARKET and it's edited from a connection between schema workbench and MySQL database:

  • Table SUPERMARKET (id_supermarket, name_supermarket, number_of_boxes, active (YES or NOT), date_of_update)
  • the SUPERMARKET table is attached directly to Fact table SALES by a foreign key.

So my question is how I can establish hierarchies and levels in SUPERMARKET dimension?

What I know is all members of a dimension table must have a relation between them like time dimension (year contains quarter, quarter contains month, month contains week, week contains day).

I have another question: Pentaho workbench export the star schema as a XML file, so how I can call or use this schema in Pentaho Kettle for ETL?


Solution

  • First advise: Use Saiku CE instead of QlikView for reporting. It is best choice for querying Mondrian OLAP, creating simple reports and quick testing.

    • Reporting tool: Saiku 2.6
    • OLAP schema: Mondrian 3.8
    • OLAP schema tool: Pentaho Schema Workbench
    • Data warehouse (or DB): MySQL
    • ETL tool: Pentaho Data Integration (Kettle)

    Lets describe the whole concept of solution:

    1. [SOURCE DB]: You have some data sources (databases, other systems), which data you want to use for reporting. Lets say you have those data stored in MySQL DB.
    2. [DWH]: To use the data for reporting, you need to create a data warehouse (in star schema), where you load data from source db.
    3. [ETL]: To extract the data from source db and load them to DWH the ETL is used.
    4. [OLAP schema]: Build the OLAP schema on the top of the DWH. It is a XML file used by reporting applications to understand how to query the DWH data. It contains a definition of OLAP cubes, dimensions and measures.
    5. [Reporting application]: Is a tool which knows how to query the DWH data in the OLAP way (it uses MDX queries on the top of cubes, dimension and measures defined in OLAP schema). You only need to define a OLAP schema location and connection to the DWH to make it work.

    Note that you can omit the source db and ETL if you don't have any source db and the aim is just to prove some functional concept.

    For the DWH (star schema DB) you need 3 tables with those columns to start:

    1. Standard dimension table [dim_supermarket]: supermarket_key, supermarket_name, number_of_boxes, active, date_of_update
    2. Time dimension table [dim_date]: date_key, day_of_month, week_of_year, month, year
    3. Fact table [fact_sales]: date_key (INT), supermarket_key (INT), sales (DECIMAL)

    ..you can fill the date dimension using ETL (Kettle) on the base of this post

    Create the OLAP schema in Pentaho Schema Wrokbench:

    1. Set up database connection to your DWH
    2. Create cube "Sales" on the base of fact_sales
      • Create measure called "Sales" (pointing to sales column in fact_sales)
      • Create standard dimension "Supermarket" (add table = dim_supermarket; fill primary key in hierarchy = supermarket_key; add level "Supermarket name" with column supermarket_name)
      • Create time dimension "Date" (set dimension type to TimeDimension; add table dim_date; fill primary key in hierarchy = date_key; add level "Year" - column year, levelType = TimeYears; add level "Month" - column month, levelType = `TimeMonths; ...)

    .. now you have created OLAP schema with one cube, one measure and two dimensions

    Set up Saiku to use your OLAP schema and DWH:

    1. Add Mondrian OLAP schema file: Copy the OLAP schema XML file to saiku/tomcat/wabapps/saiku/WEB-INF/classes/
    2. Set up OLAP schema location and MySQL connection to thw DWH: Edit saiku/tomcat/wabapps/saiku/WEB-INF/classes/saiku-datasources/test, set location=jdbc:mondrian:Jdbc=jdbc:mysql://localhost:3306/dwh;Catalog=res:test.xml;JdbcDrivers=com.mysql.jdbc.Driver;
    3. Add MySQL connector: E.g. mysql-connector-java-5.1.17 to WEB-INF/lib
    4. Run Saiku. Play with your cube.