Apache Iceberg Table Format Versions
In this blog we will explore mainly these things.
- What are the different types of Iceberg table & difference between them.
- How Iceberg table handles update/delete.
- What is copy-on-write and merge-on-read & difference between them.
Iceberg table has two different table formats v1 & v2.
- v1 format — defaults copy-on-write.
- v2 format — copy-on-write or merge-on-read.
Iceberg tables support table properties to configure table behavior. There are different types of properties eg. read properties, write properties, & other table behavior properties.
- For defining tables format.
+----------------+----------+----------------------------------------+
| Property | Default | Description |
+----------------+----------+----------------------------------------+
| format-version | 1 | Table’s format version (can be 1 or 2) |
+----------------+----------+----------------------------------------+
- For defining tables strategy copy-on-write or merge-on-read.
+-------------------+---------------+--------------------------+
| Property | Default | Description |
+-------------------+---------------+--------------------------+
| write.update.mode | copy-on-write | c-o-w or m-o-r (v2 only) |
| write.delete.mode | copy-on-write | c-o-w or m-o-r (v2 only) |
| write.merge.mode | copy-on-write | c-o-w or m-o-r (v2 only) |
+-------------------+----------+-------------------------------+
Iceberg table support update/delete through copy-on-write or merge-on-read techniques.
Copy-on-Write
- If we are updating/deleting just few rows in table, still Iceberg will re-write the entire datafile.
- So at the time of writing only, Iceberg identifies which datafiles has changes, duplicates those datafiles, applies the changes (Update/Delete).
Example: Let say you have two data files in data directory of Iceberg table data-file-1 & data-file-2. You have updated an record, which is present in data-file-2 only. Iceberg will create a new copy of data-file-2 only and apply the changes.
At the end you have three files — data-file-1, data-file-2, data-file-2-new. But you latest snapshot will only refer data-file-1 & data-file-2-new.
Use of old file (data-file-2) is just for time travel. If you don't need time travel, you can go-ahead and expire the snapshot, it will clear us-used files.
SELECT * FROM prod.db.table TIMESTAMP AS OF '1986-10-26 01:21:00';
Pros & Cons:
- copy-on-write is expensive — In case of frequent updates/deletes. For streaming pipelines it cannot be a good fit.
- copy-on-write is ideal — In case of bulk updates (where max rows getting update) but in batch mode.
- Writes are slower — As processing is require while writing. copy datafile & applying changes.
- Reading is faster — As no processing require at reader end.
Merge-on-Read
- If we are updating/deleting just few rows in table, Iceberg will not re-write the entire datafile. Instead changes are written to new file.
- So at the time of writing, Iceberg will identifies which datafiles has changes, identifies the position of those records. Write the file details & position of those records in positional delete file.
Positional delete file — hold the positions for deleted & updated records.
+------------------------------+----------+
| file_path | Default |
+------------------------------+----------+
| .../00191-1676-00001.parquet | 11 |
| .../00191-1676-00001.parquet | 21 |
+------------------------------+----------+
Also in the separate data file, it will store the updated records.
Example: Let say you have two data files in data directory of Iceberg table data-file-1 & data-file-2. You have updated an record, which is present in data-file-2 only. Iceberg will create a positional-delete-file to hold the position of that updates record. Also a new datafile with updates.
At the end you have three files — data-file-1, data-file-2, positional-delete-file, data-file-with-change-records. At the time of reading Iceberg merge thses files & show you latest data.
Pros & Cons:
- merge-on-read is ideal — In case of small/frequent updates.
- Writes are quick — as no need to re-write file. Only processing require is write positional delete file & new data file with changes.
- Reading is slower — as processing (merge) require while reading data.
Table maintenance (compaction, rewrite data files, rewrite positional-delete-file etc.) is required, once these small file grows.
As of now Iceberg supports positional-deletes only for Apache Spark. Iceberg also has equality delete, where it store the actual value of records (ID etc.) in positional-delete-file. But as of now there is no support in Spark.
Refer this blog for internals of Iceberg table.
Refer below blogs: