Search code examples
pythonapache-sparkhqlpyspark

SparkSQL: HQL script in file to be loaded on Python code


Normally, literal query strings suffices for short statements, like these:

count = sqlContext.sql("SELECT * FROM db.table").count()

However, there are cases where I have a lengthy Hive query script, and that would be too cumbersome to place in Python code.

How do I go about referencing an HQL file and get it executed in Python-SparkSQL?


Solution

  • count = sqlContext.sql(open("file.hql").read()).count()