Search code examples
databricksazure-databricks

Could not create BLOOMFILTER Index in databricks


I am trying to create BLOOMFILTER index by referring to the document - https://docs.databricks.com/spark/2.x/spark-sql/language-manual/create-bloomfilter-index.html

I created the DELTA table by,

spark.sql("DROP TABLE IF EXISTS testdb.fact_lists")
spark.sql("CREATE TABLE testdb.fact_lists USING DELTA LOCATION '/delta/fact-lists'")

I enabled bloom filter by,

%sql
SET spark.databricks.io.skipping.bloomFilter.enabled = true;
SET delta.bloomFilter.enabled = true;

When I try to run the below CREATE statement for BLOOMFILTER I get the "no viable input" error

%sql
CREATE BLOOMFILTER INDEX
ON TABLE testdb.fact_lists
FOR COLUMNS(event_id OPTION(fpp=0.1, numItems=100))

Error:

com.databricks.backend.common.rpc.DatabricksExceptions$SQLExecutionException: org.apache.spark.sql.catalyst.parser.ParseException: 
no viable alternative at input 'CREATE BLOOMFILTER'(line 1, pos 7)

== SQL ==
CREATE BLOOMFILTER INDEX
-------^^^
ON TABLE testdb.fact_lists
FOR COLUMNS(event_id OPTION(fpp=0.1, numItems=100))

    at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:298)
    at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:159)
    at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:88)
    at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:106)
    at com.databricks.sql.parser.DatabricksSqlParser.$anonfun$parsePlan$1(DatabricksSqlParser.scala:77)
    at com.databricks.sql.parser.DatabricksSqlParser.parse(DatabricksSqlParser.scala:97)
    at com.databricks.sql.parser.DatabricksSqlParser.parsePlan(DatabricksSqlParser.scala:74)
    at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:801)
    at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:80)
    at org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:151)
    at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:801)
    at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:968)
    at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:798)
    at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:695)
    at com.databricks.backend.daemon.driver.SQLDriverLocal.$anonfun$executeSql$1(SQLDriverLocal.scala:91)
    at scala.collection.immutable.List.map(List.scala:293)
    at com.databricks.backend.daemon.driver.SQLDriverLocal.executeSql(SQLDriverLocal.scala:37)
    at com.databricks.backend.daemon.driver.SQLDriverLocal.repl(SQLDriverLocal.scala:145)
    at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$11(DriverLocal.scala:605)
    at com.databricks.logging.Log4jUsageLoggingShim$.$anonfun$withAttributionContext$1(Log4jUsageLoggingShim.scala:33)
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
    at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:94)
    at com.databricks.logging.Log4jUsageLoggingShim$.withAttributionContext(Log4jUsageLoggingShim.scala:31)
    at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:205)
    at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:204)
    at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:60)
    at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:240)
    at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:225)
    at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:60)
    at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:582)
    at com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$1(DriverWrapper.scala:615)
    at scala.util.Try$.apply(Try.scala:213)
    at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:607)
    at com.databricks.backend.daemon.driver.DriverWrapper.executeCommandAndGetError(DriverWrapper.scala:526)
    at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:561)
    at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:431)
    at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:374)
    at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:225)
    at java.lang.Thread.run(Thread.java:748)

    at com.databricks.backend.daemon.driver.SQLDriverLocal.executeSql(SQLDriverLocal.scala:130)
    at com.databricks.backend.daemon.driver.SQLDriverLocal.repl(SQLDriverLocal.scala:145)
    at com.databricks.backend.daemon.driver.DriverLocal.$anonfun$execute$11(DriverLocal.scala:605)
    at com.databricks.logging.Log4jUsageLoggingShim$.$anonfun$withAttributionContext$1(Log4jUsageLoggingShim.scala:33)
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
    at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:94)
    at com.databricks.logging.Log4jUsageLoggingShim$.withAttributionContext(Log4jUsageLoggingShim.scala:31)
    at com.databricks.logging.UsageLogging.withAttributionContext(UsageLogging.scala:205)
    at com.databricks.logging.UsageLogging.withAttributionContext$(UsageLogging.scala:204)
    at com.databricks.backend.daemon.driver.DriverLocal.withAttributionContext(DriverLocal.scala:60)
    at com.databricks.logging.UsageLogging.withAttributionTags(UsageLogging.scala:240)
    at com.databricks.logging.UsageLogging.withAttributionTags$(UsageLogging.scala:225)
    at com.databricks.backend.daemon.driver.DriverLocal.withAttributionTags(DriverLocal.scala:60)
    at com.databricks.backend.daemon.driver.DriverLocal.execute(DriverLocal.scala:582)
    at com.databricks.backend.daemon.driver.DriverWrapper.$anonfun$tryExecutingCommand$1(DriverWrapper.scala:615)
    at scala.util.Try$.apply(Try.scala:213)
    at com.databricks.backend.daemon.driver.DriverWrapper.tryExecutingCommand(DriverWrapper.scala:607)
    at com.databricks.backend.daemon.driver.DriverWrapper.executeCommandAndGetError(DriverWrapper.scala:526)
    at com.databricks.backend.daemon.driver.DriverWrapper.executeCommand(DriverWrapper.scala:561)
    at com.databricks.backend.daemon.driver.DriverWrapper.runInnerLoop(DriverWrapper.scala:431)
    at com.databricks.backend.daemon.driver.DriverWrapper.runInner(DriverWrapper.scala:374)
    at com.databricks.backend.daemon.driver.DriverWrapper.run(DriverWrapper.scala:225)
    at java.lang.Thread.run(Thread.java:748)

Kindly assist. Thanks in advance!


Solution

  • I got the same error when I used the same query to create Bloomfilter index in Databricks 10.4 LTS on my sample table.

    CREATE BLOOMFILTER INDEX
    ON TABLE factlists
    FOR COLUMNS(id OPTION(fpp=0.1, numItems=100))
    
    
    #error message
    ParseException: 
    no viable alternative at input 'CREATE bloomfilter'(line 1, pos 7)
    
    == SQL ==
    CREATE bloomfilter INDEX
    -------^^^
    ON TABLE factlists
    FOR COLUMNS(id OPTION(fpp=0.1, numItems=100))
    

    enter image description here

    • The error was because of the incorrect syntax. When I used the following modified query, successful creation of Bloomfilter index was possible (OPTIONS instead of OPTION).
    CREATE bloomfilter INDEX
    ON TABLE factlists
    FOR COLUMNS(id OPTIONS(fpp=0.1, numItems=100))
    

    enter image description here

    In your query, try changing the syntax i.e., OPTION to OPTIONS (the cause for the error) to overcome the error.