Search code examples
javaapache-sparkapache-zeppelin

How can I periodically refresh in Zeppelin with Spark? (Java)


I am trying to make dashboard with Zeppelin and Spark using Java.
Let's say my data will be saved in /tmp/mydir.
Since my data is real time data, more and more data will be saved even when Zeppelin shows some results from data in /tmp/mydir.
But what I want to do is real time dashboard, which means dashboard should periodically re-calculate results from data in /tmp/mydir.
It's because amount of data in /tmp/mydir grows over and over again.
Let's guess I will use simple count() function on data in /tmp/mydir.
How can I make Zeppelin to do count() on data in /tmp/mydir every 60 seconds?
What I only wonder is how to make Zeppelin to do same function on same directory (but growing data) periodically.
Thanks!!!


Solution

  • Thanks for asking!! I think there are multiple ways to do this. You may choose whichever is suitable/applicable for your situation.

    1. Using cron scheduler option: Zeppelin provides cron scheduler option. Using this you can schedule to run a specific Zeppelin notebook in periodical intervals. Details about enabling this option can be found here - https://zeppelin.apache.org/docs/0.8.0/usage/other_features/cron_scheduler.html

    2. Using Zeppelin API: You can schedule from an external scheduler to execute all paragraphs of the Zeppelin notebook containing your queries. You need to first login to Zeppelin API, get the Jsession ID from the cookie, and then call the API to run Zeppelin Notebook. More details can be found in these links - https://community.hortonworks.com/questions/52840/authentication-with-the-zeppelin-rest-api.html, https://zeppelin.apache.org/docs/0.8.0/usage/rest_api/notebook.html