apache-spark pyspark apache-spark-sql hivecontext

how to apply Windows function in HiveQL in spark

I have seen posts discussing the usage of windows function. But i have some questions.

Since it is can only be used in HiveContext. How can i switch between SparkSQLContext and HiveContext given i am already using SparkSQLContext?

How is that possible to run a HiveQL using windows function here? I tried

df.registerTempTable("data")
from pyspark.sql import functions as F
from pyspark.sql import Window

%%hive
SELECT col1, col2, F.rank() OVER (Window.partitionBy("col1").orderBy("col3") 
FROM data

and native Hive SQL

SELECT col1, col2, RANK() OVER (PARTITION BY col1 ORDER BY col3) FROM data

but neither of them works.

Solution

How can i switch between SparkSQLContext and HiveContext given i am already using SparkSQLContext?

You cannot. Spark data frames and tables are bound to a specific context. If you want to use HiveContext then use it all the way. You drag all the dependencies anyway.

How is that possible to run a HiveQL using windows function here

sqlContext = ...  # HiveContext 
sqlContext.sql(query)

The first query you use is simply invalid. The second one should work if you use correct context and configuration.