Search code examples
djangopostgresqldata-warehouse

Data Warehouse and Django


This is more of an architectural question than a technological one per se.

I am currently building a business website/social network that needs to store large volumes of data and use that data to draw analytics (consumer behavior).

I am using Django and a PostgreSQL database.

Now my question is: I want to expand this architecture to include a data warehouse. The ideal would be: the operational DB would be the current Django PostgreSQL database, and the data warehouse would be something additional, preferably in a multidimensional model.

We are still in a very early phase, we are going to test with 50 users, so something primitive such as a one-column table for starters would be enough.

I would like to know if somebody has experience in this situation, and that could recommend me a framework to create a data warehouse, all while mantaining the operational DB with the Django models for ease of use (if possible).

Thank you in advance!


Solution

  • Here are some cool Open Source tools I used recently:

    • Kettle - great ETL tool, you can use this to extract the data from your operational database into your warehouse. Supports any database with a JDBC driver and makes it very easy to build e.g. a star schema.
    • Saiku - nice Web 2.0 frontend built on Pentaho Mondrian (MDX implementation). This allows your users to easily build complex aggregation queries (think Pivot table in Excel), and the Mondrian layer provides caching etc. to make things go fast. Try the demo here.