I was using UcanAccess 5.0.1
in databricks 9.1LTS (Spark 3.1.2, Scala 2.1.2)
, and for whatever reasons when I use the following code to read in a single record Access db table it keeps treating the column names as the record itself (I've tried adding more records and got the same results.)
The Access db table looks like this (2 records):
ID Field1 Field2
2 key1 column2
3 key2 column2-1
The python code looks like this:
connectionProperties = {
"driver" : "net.ucanaccess.jdbc.UcanaccessDriver"
}
url = "jdbc:ucanaccess:///dbfs/mnt/internal/Temporal/Database1.accdb"
df = spark.read.jdbc(url=url, table="Table1", properties=connectionProperties)
And the result looks like this:
df.printSchema()
df.count()
root
|-- Field1: string (nullable = true)
|-- Field2: string (nullable = true)
|-- ID: string (nullable = true)
Out[21]: 2
df.show()
+------+------+---+
|Field1|Field2| ID|
+------+------+---+
|Field1|Field2| ID|
|Field1|Field2| ID|
+------+------+---+
Any idea/suggestion?
turns out that there was a bug in the jdbc code ([https://stackoverflow.com/questions/63177736/spark-read-as-jdbc-return-all-rows-as-columns-name])
I added the following code and now the ucanaccess driver works fine:
%scala
import org.apache.spark.sql.jdbc.JdbcDialect
import org.apache.spark.sql.jdbc.JdbcDialects
private case object HiveDialect extends JdbcDialect {
override def canHandle(url : String): Boolean = url.startsWith("jdbc:ucanaccess")
override def quoteIdentifier(colName: String): String = {
colName.split('.').map(part => s"`$part`").mkString(".")
}
}
JdbcDialects.registerDialect(HiveDialect)
Then display(df)
would show
|Field1 |Field2 |ID |
|:------|:------|:----- |
|key1 |column2 | 2|
|key2 |column2-1| 3|