Search code examples
apache-sparkjdbcclickhouse

Writing an array column to clickhouse with spark and jdbc


I'm trying to write that simple dataframe

val df = Seq(
    List("aa","bb","cc"),
    List("aa","bb","cc"),
    List("aa","bb","cc")
).toDF("str")

to the clickhouse table

CREATE TABLE IF NOT EXISTS testtable (
    str Array(String)
) ENGINE = Memory

That is writing process:

df.write
    .format("jdbc")
    .mode("append")
    .option("driver", "com.clickhouse.jdbc.ClickHouseDriver")
    .option("url", "jdbcurl")
    .option("user", "user")
    .option("password", "password")
    .option("dbtable", "testtable")
    .save

And during processing i faced with that error:

java.lang.IllegalArgumentException: Can't get JDBC type for array<string>

I tried to convert array to the string this way df.withColumn("str", concat_ws(",", $"str")), but then I faced with that error:

java.sql.SQLException: Unsupported type ARRAY

Meanwhile the data was written to the database, but not in format that I expected (["aa,bb,cc"] instead of ["aa", "bb", "cc"] - three separated values). Also official clickhouse-java docs mentions, that if you want to use arrays, you have to prepare statement with setArray function, but I don't understand, how exactly I can do this in spark. Can anybody help me with that?


Solution

  • I solved this issue with switching from clickhouse-jdbc driver to clickhouse-native-jdbc driver by Housepower: https://housepower.github.io/ClickHouse-Native-JDBC/

    Also you need to register ClickHouseDialect, all details are mentioned in link above (section Spark Integration).