Search code examples
tableau-apireportingdashboardapache-zeppelin

Is Apache Zeppelin suitable for presenting dashboard for several users?


In other words, can Zeppelin be used as a Tableau replacement at small scale?

I have a new UI/UX design of reporting dashboard. Data for dashboard comes from relational database (SQL Server). This dashboard is to be viewed by ~300 colleagues in my company. Perhaps up to ten of them will be viewing it at the same time.

Currently the dashboard is implemented in Kibana with data being imported into Elasticsearch from SQL Server on a regular basis. However, the new design requires certain widgets and data aggregations that go beyond dashboarding capabilities of Kibana. Additionally, my organization desires to migrate this dashboard to a technology which is considered more familiar for data scientists that work with us (Kibana isn't considered such).

This report and dashboard could be migrated to Tableau. Tableau is powerful enough to perform desired data aggregations and present all desired widgets. However we can't afford licenses cost, but we can invest as much developer's time as needed.

I have evaluated couple of open-source dashboarding tools (Metabase and Superset) and they are lacking aggregations and widgets that we need. I would not go into details because the question is not about specifics. It is clear that Metabase and Superset are not powerful enough for our needs.

I have an impression that Apache Zeppelin is powerful enough with its support for arbitrary Python code (I would use Pandas for data aggregations), graphs and widgets. However, I am not sure if single Zeppelin instance can support well number of concurrent viewers.

We'd like to build a set of notebooks and make them available to all colleagues in the organization (access control is not an issue, we trust each other). Notebooks will be interactive with data filters and date range pickers.

Looks like Zeppelin has switchable interpreter isolation modes which we can use to make different user's sessions isolated from each other. My question is whether a single t2.large AWS instance hosting Zeppelin can sustain up to ten users viewing report aggregated on 300k rows dataset. Also, are there any usability concerns which make an idea of multi-user viewing of reporting dashboard impractical for Zeppelin?


Solution

  • I see a couple questions you're asking:

    1. Can Zeppelin replace Tableau on a small scale? This depends on what features you are using in Tableau. Every platform has its own set of features that the others do or don't have, and Tableau has a lot of customization options that you won't find elsewhere. Aim to get as much of your dashboard converted 1:1 then warm everyone up to the idea that it will look/operate a little bit different since it's on a different platform.

    2. Can a t2.large hosting Zeppelin sustain up to 10 concurrent users viewing a report aggregated on 300k rows? A t2.large should be more than big enough to run Zeppelin, Tableau, Superset, etc. with 10 concurrent users pulling a report with 300k rows. 300k isn't really that much.

    A good way to speed things up and squeeze more concurrent users on with your existing infrastructure is to speed up your data sources. That is where a lot of the aggregation calculations happen. Taking a look at your ETL's and trying to aggregate ahead of time can help, as well as make sure your data scientists aren't running massive queries slowing down your database server.