Apache Iceberg Table Format Versions

BigDataEnthusiast
4 min readJun 25, 2023

--

In this blog we will explore mainly these things.

  • What are the different types of Iceberg table & difference between them.
  • How Iceberg table handles update/delete.
  • What is copy-on-write and merge-on-read & difference between them.

Iceberg table has two different table formats v1 & v2.

  • v1 format — defaults copy-on-write.
  • v2 format — copy-on-write or merge-on-read.

Iceberg tables support table properties to configure table behavior. There are different types of properties eg. read properties, write properties, & other table behavior properties.

  • For defining tables format.
+----------------+----------+----------------------------------------+
| Property | Default | Description |
+----------------+----------+----------------------------------------+
| format-version | 1 | Table’s format version (can be 1 or 2) |
+----------------+----------+----------------------------------------+
  • For defining tables strategy copy-on-write or merge-on-read.
+-------------------+---------------+--------------------------+
| Property | Default | Description |
+-------------------+---------------+--------------------------+
| write.update.mode | copy-on-write | c-o-w or m-o-r (v2 only) |
| write.delete.mode | copy-on-write | c-o-w or m-o-r (v2 only) |
| write.merge.mode | copy-on-write | c-o-w or m-o-r (v2 only) |
+-------------------+----------+-------------------------------+

Iceberg table support update/delete through copy-on-write or merge-on-read techniques.

Copy-on-Write

  • If we are updating/deleting just few rows in table, still Iceberg will re-write the entire datafile.
  • So at the time of writing only, Iceberg identifies which datafiles has changes, duplicates those datafiles, applies the changes (Update/Delete).

Example: Let say you have two data files in data directory of Iceberg table data-file-1 & data-file-2. You have updated an record, which is present in data-file-2 only. Iceberg will create a new copy of data-file-2 only and apply the changes.

At the end you have three files — data-file-1, data-file-2, data-file-2-new. But you latest snapshot will only refer data-file-1 & data-file-2-new.

Use of old file (data-file-2) is just for time travel. If you don't need time travel, you can go-ahead and expire the snapshot, it will clear us-used files.

SELECT * FROM prod.db.table TIMESTAMP AS OF '1986-10-26 01:21:00';

Pros & Cons:

  • copy-on-write is expensive — In case of frequent updates/deletes. For streaming pipelines it cannot be a good fit.
  • copy-on-write is ideal — In case of bulk updates (where max rows getting update) but in batch mode.
  • Writes are slower — As processing is require while writing. copy datafile & applying changes.
  • Reading is faster — As no processing require at reader end.

Merge-on-Read

  • If we are updating/deleting just few rows in table, Iceberg will not re-write the entire datafile. Instead changes are written to new file.
  • So at the time of writing, Iceberg will identifies which datafiles has changes, identifies the position of those records. Write the file details & position of those records in positional delete file.

Positional delete file — hold the positions for deleted & updated records.

+------------------------------+----------+
| file_path | Default |
+------------------------------+----------+
| .../00191-1676-00001.parquet | 11 |
| .../00191-1676-00001.parquet | 21 |
+------------------------------+----------+

Also in the separate data file, it will store the updated records.

Example: Let say you have two data files in data directory of Iceberg table data-file-1 & data-file-2. You have updated an record, which is present in data-file-2 only. Iceberg will create a positional-delete-file to hold the position of that updates record. Also a new datafile with updates.

At the end you have three files — data-file-1, data-file-2, positional-delete-file, data-file-with-change-records. At the time of reading Iceberg merge thses files & show you latest data.

Pros & Cons:

  • merge-on-read is ideal — In case of small/frequent updates.
  • Writes are quick — as no need to re-write file. Only processing require is write positional delete file & new data file with changes.
  • Reading is slower — as processing (merge) require while reading data.

Table maintenance (compaction, rewrite data files, rewrite positional-delete-file etc.) is required, once these small file grows.

As of now Iceberg supports positional-deletes only for Apache Spark. Iceberg also has equality delete, where it store the actual value of records (ID etc.) in positional-delete-file. But as of now there is no support in Spark.

Refer this blog for internals of Iceberg table.

Refer below blogs:

References

--

--

BigDataEnthusiast
BigDataEnthusiast

Written by BigDataEnthusiast

AWS Certified Data Engineer | Databricks Certified Apache Spark 3.0 Developer | Oracle Certified SQL Expert

No responses yet