It is possible to use org.apache.spark.sql.delta.sources.DeltaDataSource
directly to ingest data continuously in append mode ?
Is there another more suitable approach? My concern is about latency and scalability since the data acquisition frequency can reach 30 KHz in each vibration sensor and there are several of them and I need to record the raw data in Delta Lake for FFT and Wavelet analysis, among others.
In my architecture the data ingestion is done continuously in a Spark application while the analyzes are performed in another independent Spark application with on-demand queries.
If there is no solution for Delta Lake, a solution for Apache Parquet would work because it will be possible to create Datasets in Delta Lake from data stored in Parquet Datasets.
Yes, it's possible and it works well. There are several advantages of Delta for streaming architecture:
P.S. you can just use .format("delta")
instead of full class name