Search code examples
javaapache-sparkapache-spark-sqlwindow-functionsrank

rank() function usage in Spark SQL


Need some pointers in using rank()

I have extracted a column from a dataset..need to do the ranking.

Dataset<Row> inputCol= inputDataset.apply("Colname");    
Dataset<Row>  DSColAwithIndex=inputDSAAcolonly.withColumn("df1Rank", rank());

DSColAwithIndex.show();

I can sort the column and then append an index column too to get rank...but curious to known syntax and usage of rank()


Solution

  • Window spec need to be specified for rank()

    val w = org.apache.spark.sql.expressions.Window.orderBy("date") //some spec    
    
    val leadDf = inputDSAAcolonly.withColumn("df1Rank", rank().over(w))
    

    Edit: Java version of answer, as OP using Java

    import org.apache.spark.sql.expressions.WindowSpec; 
    WindowSpec w = org.apache.spark.sql.expressions.Window.orderBy(colName);
    Dataset<Row> leadDf = inputDSAAcolonly.withColumn("df1Rank", rank().over(w));