Search code examples
sqlapache-sparkpysparkhive

Spark / Hive: how to get percent of positive values in a column?


Is there any SQL function that calculates the positive value rate in a column of Spark / Hive table?

P.S. I'm using PySpark 2.4


Solution

  • There isn't a built-in SQL function to directly calculate the positive value rate in a column of a Spark or Hive table. However, you can achieve this using a combination of SQL functions.

    result = spark.sql("""
    SELECT 
        COUNT(CASE WHEN column_name > 0 THEN 1 END) / COUNT(*) as positive_rate
    FROM table
    """)