Search code examples
apache-sparkserverthrift

Connecting/Accessing Hive data through Spark Thrift server on Power BI


I am rather new to data connectivity on multiple platforms, my requirement here is simple, I need to be able to access Spark Thrift server via Power BI, can anyone guide me with the required steps for the same?


Solution

  • I've had to integrate quite a few big data & analytics tools, and have a good amount of experience with spark

    Typically I look for it on the tableau documentation https://onlinehelp.tableau.com/current/pro/desktop/en-us/examples_sparksql.html

    or the tool's docs
    https://powerbi.microsoft.com/en-us/blog/power-bi-desktop-november-feature-summary/#spark

    but I'm partial to these docs
    https://github.com/oracle/learning-library/blob/master/workshops/journey2-new-data-lake/files/18.1.4/pdf/Connecting%20DVD3%20and%20Spark.pdf

    You'll need to make sure you've got spark-thift up and listening to an open port. Then you'll need different information and the type of connection you're using (jdbc, odbc...)

    This is assuming you've got a preview version of the DirectQuery
    https://learn.microsoft.com/en-us/power-bi/desktop-directquery-data-sources