Skip to content

Commit

Permalink
Deletion Vectors in DELETE, MERGE and UPDATE commands
Browse files Browse the repository at this point in the history
  • Loading branch information
jaceklaskowski committed Jun 3, 2024
1 parent ec8e2a2 commit 549fb1c
Show file tree
Hide file tree
Showing 14 changed files with 150 additions and 23 deletions.
2 changes: 1 addition & 1 deletion docs/commands/alter/.pages
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
title: ALTER
title: ALTER TABLE
nav:
- index.md
- ...
21 changes: 21 additions & 0 deletions docs/commands/alter/AlterTableDropFeature.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
---
title: DROP FEATURE
---

# AlterTableDropFeature

`AlterTableDropFeature` is a `AlterTableCommand` ([Spark SQL]({{ book.spark_sql }}/logical-operators/AlterTableCommand)) unary logical operator that represents [ALTER TABLE DROP FEATURE](../../sql/index.md#ALTER-TABLE-DROP-FEATURE) SQL command in a logical query plan.

`AlterTableDropFeature` supports a single feature removal (by the [feature name](#featureName)).

## Creating Instance

`AlterTableDropFeature` takes the following to be created:

* <span id="table"> Table (`LogicalPlan`)
* <span id="featureName"> Feature name
* <span id="truncateHistory"> `truncateHistory` flag

`AlterTableDropFeature` is created when:

* `DeltaSqlAstBuilder` is requested to [parse ALTER TABLE DROP FEATURE SQL command](../../sql/DeltaSqlAstBuilder.md#visitAlterTableDropFeature)
5 changes: 3 additions & 2 deletions docs/commands/alter/index.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
---
title: ALTER
title: ALTER TABLE
---

# ALTER Commands
# ALTER TABLE Commands

Delta Lake supports altering delta tables using `ALTER TABLE` SQL commands with the following clauses:

Expand All @@ -11,6 +11,7 @@ Delta Lake supports altering delta tables using `ALTER TABLE` SQL commands with
* [CHANGE COLUMN](AlterTableChangeColumnDeltaCommand.md)
* [DROP COLUMNS](AlterTableDropColumnsDeltaCommand.md)
* [DROP CONSTRAINT](AlterTableDropConstraintDeltaCommand.md)
* [DROP FEATURE](AlterTableDropFeature.md)
* [REPLACE COLUMNS](AlterTableReplaceColumnsDeltaCommand.md)
* [SET TBLPROPERTIES](AlterTableSetPropertiesDeltaCommand.md)
* [UNSET TBLPROPERTIES](AlterTableUnsetPropertiesDeltaCommand.md)
2 changes: 1 addition & 1 deletion docs/commands/delete/DeleteCommand.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,7 +116,7 @@ shouldWritePersistentDeletionVectors(
`shouldWritePersistentDeletionVectors` is enabled (`true`) when the following all hold:

1. [spark.databricks.delta.delete.deletionVectors.persistent](../../configuration-properties/DeltaSQLConf.md#DELETE_USE_PERSISTENT_DELETION_VECTORS) configuration property is enabled (`true`)
1. [Protocol and table configuration support deletion vector feature](../../deletion-vectors/DeletionVectorUtils.md#deletionVectorsWritable)
1. [Protocol and table configuration support deletion vectors feature](../../deletion-vectors/DeletionVectorUtils.md#deletionVectorsWritable)

## <span id="apply"> Creating DeleteCommand

Expand Down
19 changes: 19 additions & 0 deletions docs/commands/merge/MergeIntoCommandBase.md
Original file line number Diff line number Diff line change
Expand Up @@ -512,3 +512,22 @@ isCdcEnabled(
`isCdcEnabled` is used when:

* `ClassicMergeExecutor` is requested to [findTouchedFiles](ClassicMergeExecutor.md#findTouchedFiles), [writeAllChanges](ClassicMergeExecutor.md#writeAllChanges)

## shouldWritePersistentDeletionVectors { #shouldWritePersistentDeletionVectors }

```scala
shouldWritePersistentDeletionVectors(
spark: SparkSession,
txn: OptimisticTransaction): Boolean
```

`shouldWritePersistentDeletionVectors` is enabled (`true`) when the following all hold:

1. [spark.databricks.delta.merge.deletionVectors.persistent](../../configuration-properties/index.md#merge.deletionVectors.persistent) configuration property is enabled (`true`)
1. [Protocol and table configuration support deletion vectors feature](../../deletion-vectors/DeletionVectorUtils.md#deletionVectorsWritable)

---

`shouldWritePersistentDeletionVectors` is used when:

* `MergeIntoCommand` is requested to [run a merge](MergeIntoCommand.md#runMerge)
23 changes: 23 additions & 0 deletions docs/commands/update/UpdateCommand.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@

`UpdateCommand` is a `RunnableCommand` ([Spark SQL]({{ book.spark_sql }}/logical-operators/RunnableCommand/)) logical operator.

`UpdateCommand` can use [Deletion Vectors](../../deletion-vectors/index.md) table feature to _soft-delete_ records when [executed](#run) (based on [shouldWritePersistentDeletionVectors](#shouldWritePersistentDeletionVectors)).

## Creating Instance

`UpdateCommand` takes the following to be created:
Expand Down Expand Up @@ -52,6 +54,8 @@ performUpdate(

`performUpdate`...FIXME

With [persistent Deletion Vectors enabled](#shouldWritePersistentDeletionVectors), `performUpdate`...FIXME and [findTouchedFiles](../../deletion-vectors/DMLWithDeletionVectorsHelper.md#findTouchedFiles).

### rewriteFiles

```scala
Expand All @@ -74,3 +78,22 @@ buildUpdatedColumns(
```

`buildUpdatedColumns`...FIXME

## shouldWritePersistentDeletionVectors { #shouldWritePersistentDeletionVectors }

```scala
shouldWritePersistentDeletionVectors(
spark: SparkSession,
txn: OptimisticTransaction): Boolean
```

`shouldWritePersistentDeletionVectors` is enabled (`true`) when the following all hold:

1. [spark.databricks.delta.update.deletionVectors.persistent](../../configuration-properties/index.md#update.deletionVectors.persistent) configuration property is enabled (`true`)
1. [Protocol and table configuration support deletion vectors feature](../../deletion-vectors/DeletionVectorUtils.md#deletionVectorsWritable)

---

`shouldWritePersistentDeletionVectors` is used when:

* `UpdateCommand` is [executed](#run) (and [performUpdate](UpdateCommand.md#performUpdate))
8 changes: 8 additions & 0 deletions docs/configuration-properties/DeltaSQLConf.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,10 @@

[spark.databricks.delta.history.maxKeysPerList](index.md#history.maxKeysPerList)

## merge.deletionVectors.persistent { #MERGE_USE_PERSISTENT_DELETION_VECTORS }

[spark.databricks.delta.merge.deletionVectors.persistent](index.md#merge.deletionVectors.persistent)

## merge.materializeSource { #DELTA_COLLECT_STATS_USING_TABLE_SCHEMA }

[spark.databricks.delta.merge.materializeSource](index.md#merge.materializeSource)
Expand Down Expand Up @@ -106,6 +110,10 @@

[spark.databricks.delta.timeTravel.resolveOnIdentifier.enabled](index.md#timeTravel.resolveOnIdentifier.enabled)

## update.deletionVectors.persistent { #UPDATE_USE_PERSISTENT_DELETION_VECTORS }

[spark.databricks.delta.update.deletionVectors.persistent](index.md#update.deletionVectors.persistent)

## write.txnVersion.autoReset.enabled { #DELTA_IDEMPOTENT_DML_AUTO_RESET_ENABLED }

[spark.databricks.delta.write.txnVersion.autoReset.enabled](index.md#write.txnVersion.autoReset.enabled)
28 changes: 25 additions & 3 deletions docs/configuration-properties/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,7 +177,7 @@ If disabled, merge the two configurations with the same semantics as update and

Default: `true`

### <span id="delete.deletionVectors.persistent"><span id="DELETE_USE_PERSISTENT_DELETION_VECTORS"> delete.deletionVectors.persistent
### <span id="DELETE_USE_PERSISTENT_DELETION_VECTORS"> delete.deletionVectors.persistent { #delete.deletionVectors.persistent }

**spark.databricks.delta.delete.deletionVectors.persistent**

Expand Down Expand Up @@ -319,6 +319,14 @@ Used when:

Default: `50`

### <span id="MERGE_USE_PERSISTENT_DELETION_VECTORS"> merge.deletionVectors.persistent { #merge.deletionVectors.persistent }

**spark.databricks.delta.merge.deletionVectors.persistent**

**(internal)** Enables [persistent Deletion Vectors](../deletion-vectors/index.md) in [MERGE](../commands/merge/index.md) command

Default: `true`

### <span id="MERGE_MATERIALIZE_SOURCE"> merge.materializeSource { #merge.materializeSource }

**spark.databricks.delta.merge.materializeSource**
Expand Down Expand Up @@ -682,12 +690,26 @@ Used when:

* `DeltaSourceBase` is requested for the [allowUnsafeStreamingReadOnColumnMappingSchemaChanges](../spark-connector/DeltaSourceBase.md#allowUnsafeStreamingReadOnColumnMappingSchemaChanges)

### <span id="timeTravel.resolveOnIdentifier.enabled"><span id="RESOLVE_TIME_TRAVEL_ON_IDENTIFIER"> timeTravel.resolveOnIdentifier.enabled
### <span id="RESOLVE_TIME_TRAVEL_ON_IDENTIFIER"> timeTravel.resolveOnIdentifier.enabled { #timeTravel.resolveOnIdentifier.enabled }

**spark.databricks.delta.timeTravel.resolveOnIdentifier.enabled**

**(internal)** Enables [time travel](../time-travel/index.md) patterns (as `@v123` and `@yyyyMMddHHmmssSSS`) in the path identifiers of delta tables

**spark.databricks.delta.timeTravel.resolveOnIdentifier.enabled** (internal) controls whether to resolve patterns as `@v123` and `@yyyyMMddHHmmssSSS` in path identifiers as [time travel](../time-travel/index.md) nodes.
Default: `true`

### <span id="UPDATE_USE_PERSISTENT_DELETION_VECTORS"> update.deletionVectors.persistent { #update.deletionVectors.persistent }

**spark.databricks.delta.update.deletionVectors.persistent**

**(internal)** Enables [persistent Deletion Vectors](../deletion-vectors/index.md) in [UPDATE](../commands/update/index.md) command

Default: `true`

Used when:

* [UpdateCommand](../commands/update/UpdateCommand.md) is executed (and [shouldWritePersistentDeletionVectors](../commands/update/UpdateCommand.md#shouldWritePersistentDeletionVectors))

### <span id="vacuum.parallelDelete.enabled"><span id="DELTA_VACUUM_PARALLEL_DELETE_ENABLED"> vacuum.parallelDelete.enabled

**spark.databricks.delta.vacuum.parallelDelete.enabled** enables parallelizing the deletion of files during [vacuum](../commands/vacuum/index.md) command.
Expand Down
6 changes: 5 additions & 1 deletion docs/deletion-vectors/DMLWithDeletionVectorsHelper.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# DMLWithDeletionVectorsHelper

`DMLWithDeletionVectorsHelper` is a [DeltaCommand](../commands/DeltaCommand.md) with utilities for DML operations to work with [Deletion Vectors](index.md).

## createTargetDfForScanningForMatches { #createTargetDfForScanningForMatches }

```scala
Expand Down Expand Up @@ -57,7 +59,9 @@ findTouchedFiles(
opName: String): Seq[TouchedFileWithDV]
```

`findTouchedFiles`...FIXME
`findTouchedFiles` requests the given [TahoeFileIndex](../TahoeFileIndex.md) (that is assumed a [TahoeBatchFileIndex](../TahoeBatchFileIndex.md)) for the [AddFiles](../TahoeBatchFileIndex.md#addFiles).

In the end, `findTouchedFiles` [findFilesWithMatchingRows](#findFilesWithMatchingRows) with [candidate file map](../commands/DeltaCommand.md#generateCandidateFileMap) and [matched row index sets](DeletionVectorBitmapGenerator.md#buildRowIndexSetsForFilesMatchingCondition).

---

Expand Down
7 changes: 4 additions & 3 deletions docs/deletion-vectors/DeletionVectorUtils.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,13 +30,13 @@ deletionVectorsReadable(
## deletionVectorsWritable { #deletionVectorsWritable }

```scala
deletionVectorsWritable(
protocol: Protocol,
metadata: Metadata): Boolean
deletionVectorsWritable(
snapshot: SnapshotDescriptor,
newProtocol: Option[Protocol] = None,
newMetadata: Option[Metadata] = None): Boolean
deletionVectorsWritable(
protocol: Protocol,
metadata: Metadata): Boolean
```

`deletionVectorsWritable` is enabled (`true`) when the following all hold:
Expand All @@ -52,6 +52,7 @@ deletionVectorsWritable(
* [DELETE](../commands/delete/index.md) command is executed (and requested to [shouldWritePersistentDeletionVectors](../commands/delete/DeleteCommand.md#shouldWritePersistentDeletionVectors))
* [MERGE](../commands/merge/index.md) command is executed (and requested to [shouldWritePersistentDeletionVectors](../commands/merge/MergeIntoCommandBase.md#shouldWritePersistentDeletionVectors))
* [UPDATE](../commands/update/index.md) command is executed (and requested to [shouldWritePersistentDeletionVectors](../commands/update/UpdateCommand.md#shouldWritePersistentDeletionVectors))
* `CheckNoDeletionVector` is executed (for Iceberg-compatibility)

## isTableDVFree { #isTableDVFree }

Expand Down
24 changes: 21 additions & 3 deletions docs/deletion-vectors/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,19 @@

**Deletion Vectors** is a [table feature](../table-features/index.md) to _soft-delete_ records (merely marking them as removed without rewriting the underlying parquet data files).

Deletion Vectors is supported by conditional [DELETE](../commands/delete/index.md)s (when executed with a delete condition).
Deletion Vectors is supported by the following commands (based on their corresponding "guard" configuration properties):

Command | Configuration Property
-|-
[DELETE](../commands/delete/index.md)<br>(when executed with a condition) | [spark.databricks.delta.delete.deletionVectors.persistent](../configuration-properties/index.md#delete.deletionVectors.persistent)
[MERGE](../commands/merge/index.md) | [spark.databricks.delta.merge.deletionVectors.persistent](../configuration-properties/index.md#merge.deletionVectors.persistent)
[UPDATE](../commands/update/index.md) | [spark.databricks.delta.update.deletionVectors.persistent](../configuration-properties/index.md#update.deletionVectors.persistent)

Deletion Vectors can be enabled on a delta table using [delta.enableDeletionVectors](../table-properties/DeltaConfigs.md#enableDeletionVectors) table property.

```sql
ALTER TABLE my_delta_table SET TBLPROPERTIES ('delta.enableDeletionVectors' = true);
ALTER TABLE my_delta_table
SET TBLPROPERTIES ('delta.enableDeletionVectors' = true);
```

Deletion Vectors is used on a delta table when all of the following hold:
Expand All @@ -26,7 +33,18 @@ Deletion Vectors is used on a delta table when all of the following hold:

[spark.databricks.delta.delete.deletionVectors.persistent](../configuration-properties/index.md#delete.deletionVectors.persistent)

## Iceberg-Compatibility Feature
## UniForm Iceberg

UniForm Iceberg (`IcebergCompatV2` and `IcebergCompatV1`) uses `CheckNoDeletionVector` check to assert that Deletion Vectors are disabled on a delta table.

```text
IcebergCompatV<version> requires Deletion Vectors to be disabled on the table.
Please use the ALTER TABLE DROP FEATURE command to disable Deletion Vectors
and to remove the existing Deletion Vectors from the table.
```

??? note "ALTER TABLE DROP FEATURE SQL command"
Use [ALTER TABLE DROP FEATURE](../sql/index.md#ALTER-TABLE-DROP-FEATURE) SQL command to drop a feature on a delta table.

The Iceberg-compatibility feature ([REORG TABLE](../commands/reorg/index.md) with `ICEBERG_COMPAT_VERSION`) turns Deletion Vectors off.

Expand Down
1 change: 1 addition & 0 deletions docs/sql/DeltaSqlAstBuilder.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ SQL Statement | Logical Command
--------------|----------
<span id="visitAddTableConstraint"> [ALTER TABLE ADD CONSTRAINT](index.md#ALTER-TABLE-ADD-CONSTRAINT) | [AlterTableAddConstraint](../check-constraints/AlterTableAddConstraint.md)
<span id="visitDropTableConstraint"> [ALTER TABLE DROP CONSTRAINT](index.md#ALTER-TABLE-DROP-CONSTRAINT) | [AlterTableDropConstraint](../check-constraints/AlterTableDropConstraint.md)
<span id="visitAlterTableDropFeature"> [ALTER TABLE DROP FEATURE](index.md#ALTER-TABLE-DROP-FEATURE) | [AlterTableDropFeature](../commands/alter/AlterTableDropFeature.md)
[CONVERT TO DELTA](index.md#CONVERT-TO-DELTA) | [ConvertToDeltaCommand](../commands/convert/ConvertToDeltaCommand.md)
<span id="visitDescribeDeltaDetail"> [DESCRIBE DETAIL](index.md#DESCRIBE-DETAIL) | [DescribeDeltaDetailCommand](../commands/describe-detail/DescribeDeltaDetailCommand.md)
<span id="visitDescribeDeltaHistory"> [DESCRIBE HISTORY](index.md#describe-history) | [DescribeDeltaHistoryCommand](../commands/describe-history/DescribeDeltaHistoryCommand.md)
Expand Down
25 changes: 17 additions & 8 deletions docs/sql/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ The SQL statements support table identifiers of the format `` delta.`path` `` (w

The SQL statements can also refer to tables that are registered in a catalog (_metastore_).

## <span id="ALTER-TABLE-ADD-CONSTRAINT"> ALTER TABLE ADD CONSTRAINT
## ALTER TABLE ADD CONSTRAINT { #ALTER-TABLE-ADD-CONSTRAINT }

```text
ALTER TABLE table
Expand All @@ -16,7 +16,7 @@ CHECK (expr+)

Creates an [AlterTableAddConstraint](../check-constraints/AlterTableAddConstraint.md)

## <span id="ALTER-TABLE-DROP-CONSTRAINT"> ALTER TABLE DROP CONSTRAINT
## ALTER TABLE DROP CONSTRAINT { #ALTER-TABLE-DROP-CONSTRAINT }

```text
ALTER TABLE table
Expand All @@ -25,6 +25,15 @@ DROP CONSTRAINT (IF EXISTS)? name

Creates a [AlterTableDropConstraint](../check-constraints/AlterTableDropConstraint.md)

## ALTER TABLE DROP FEATURE { #ALTER-TABLE-DROP-FEATURE }

```text
ALTER TABLE table
DROP FEATURE featureName (TRUNCATE HISTORY)?
```

Creates a [AlterTableDropFeature](../commands/alter/AlterTableDropFeature.md) logical operator

## CLONE { #clone }

```antlr
Expand Down Expand Up @@ -53,7 +62,7 @@ temporalClause

Creates a [CloneTableStatement](../commands/clone/CloneTableStatement.md)

## <span id="CONVERT-TO-DELTA"> CONVERT TO DELTA
## CONVERT TO DELTA { #CONVERT-TO-DELTA }

```text
CONVERT TO DELTA table
Expand All @@ -63,7 +72,7 @@ CONVERT TO DELTA table

Creates a [ConvertToDeltaCommand](../commands/convert/ConvertToDeltaCommand.md)

## <span id="DESCRIBE-DETAIL"> DESCRIBE DETAIL
## DESCRIBE DETAIL { #DESCRIBE-DETAIL }

```text
(DESC | DESCRIBE) DETAIL (path | table)
Expand All @@ -80,15 +89,15 @@ Executes [DescribeDeltaDetailCommand](../commands/describe-detail/DescribeDeltaD

Creates a [DescribeDeltaHistory](../commands/describe-history/DescribeDeltaHistory.md)

## <span id="GENERATE"> GENERATE
## GENERATE { #GENERATE }

```text
GENERATE modeName FOR TABLE table
```

Executes [DeltaGenerateCommand](../commands/generate/DeltaGenerateCommand.md)

## <span id="OPTIMIZE"> OPTIMIZE
## OPTIMIZE { #OPTIMIZE }

```text
OPTIMIZE (path | table)
Expand All @@ -105,7 +114,7 @@ Executes [OptimizeTableCommand](../commands/optimize/OptimizeTableCommand.md) on

Parsed by [DeltaSqlAstBuilder](DeltaSqlAstBuilder.md#visitOptimizeTable) that creates an [OptimizeTableCommand](../commands/optimize/OptimizeTableCommand.md)

## <span id="RESTORE"> RESTORE
## RESTORE { #RESTORE }

```text
RESTORE TABLE? table
Expand All @@ -119,7 +128,7 @@ temporalClause

Creates a [RestoreTableStatement](../commands/restore/RestoreTableStatement.md)

## <span id="VACUUM"> VACUUM
## VACUUM { #VACUUM }

```text
VACUUM (path | table)
Expand Down
2 changes: 1 addition & 1 deletion mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -267,7 +267,7 @@ nav:
- Commands:
- commands/index.md
- commands/DeltaCommand.md
- ALTER:
- ALTER TABLE:
- ... | flat | commands/alter/**.md
- CLONE:
- ... | flat | commands/clone/**.md
Expand Down

0 comments on commit 549fb1c

Please sign in to comment.