Search code examples
amazon-s3databricksdelta-lakeaws-lake-formation

What are the major differences between S3 lake formation governed tables and databricks delta tables?


What are the major differences between S3 lake formation governed tables and databricks delta tables? they look pretty similar.


Solution

  • Governed tables, Delta Lake, and to some extent also Apache Iceberg and Hudi are all tabular data formats. Instead of storing data solely in raw formats (parquet, orc, avro) tablular formats have additional manifest files which provides metadata about which files are present in a table during a certain state. This allows them all to enable features like ACID transactions, time-travel, and snapshotting. The main difference right now is which big data tools they can integrate with.

    AWS Governed tables is a Lake Formation offering and thus lets you govern access of data catalog objects (database, table, and column) through the Lake Formation permission model. It also offers integration with AWS query engines: Redshift Spectrum, Glue, and Athena. EMR Spark is not yet supported. It also provides ACID transactions, time traveling, and snapshotting.

    Delta Lakes provides ACID transactions, time traveling, and snapshotting on Spark. It also supports Spark streaming and data mutation.