I was able to make deltalake work locally for unit-testing data+spark app logic.
def readDeltaLake(path: String)(implicit sc: SparkSession): DataFrame =
sc.read
.format("org.apache.spark.sql.delta.sources.DeltaDataSource")
.load(path)
// local spark session
implicit val sparkSession: SparkSession = aSparkSession()
import sparkSession.implicits._
// path to scala/test/resources with parquet file
io.delta.tables.DeltaTable.convertToDelta(sparkSession, s"parquet.`${singleInput.getParent.toFile.getAbsolutePath}`")
val myTestData = readDeltaLake(singleInput.getParent.toFile.getAbsolutePath)
myTestData.count() shouldBe 42L
Code above works fine, but I want to mimic real delta lake layout with partitions. My partition schema is:
hdfs://my_data/delta/ds=2024-05-27 23%3A00%3A00
how can I create same thing but with date partitions?
Per the comments and the link writing delta:
val data = spark.range(5, 10)
data.partitionBy("","").write.format("delta").mode("overwrite").save("/tmp/delta-table")
df.show()