Expire Snapshots¶
The expire_snapshots
action expires amassed snapshots. Data files are not deleted until they are no longer referenced by a snapshot that may be used for time travel or rollback. Regularly expiring snapshots deletes unused data files.
Important considerations
- Expiring old snapshots removes them from metadata, so they are no longer available for time travel queries.
- Data files are not deleted until they are no longer referenced by a snapshot that may be used for time travel or rollback.
- Regularly expiring snapshots deletes unused data files.
To check the timestamp and list of snapshot add the following command in your terminal:
Expected output
INFO[0000] 📂 get snapshots...
INFO[0003] 📂 get snapshots...completed
SNAPSHOTID │ TIMESTAMP │ DATE AND TIME (GMT)
──────────────────────┼───────────────┼────────────────────────────
7177047349072031975 │ 1744366300306 │ 2025-04-11T10:11:40+00:00
580493728505961346 │ 1744366356145 │ 2025-04-11T10:12:36+00:00
2613385101287075565 │ 1744366357891 │ 2025-04-11T10:12:37+00:00
Tip
It is advisable to use Flare 5.0 with updated image tag 7.3.21
for the expire_snapshot action, as Flare 6.0 currently supports only two methods for snapshot expiration: the expireOlderThan
and expireSnapshotId
attributes.
Configuration Options for Snapshot Deletion¶
Attribute | Description |
---|---|
olderThanMillis |
Milliseconds since epoch before which snapshots will be removed. This replaces expireOlderThan as the primary time-based expiration setting. |
olderThanTimestamp |
A human-readable timestamp before which snapshots will be removed (e.g., 2024-12-01T00:00:00Z ). |
snapshotIds |
Specifies the list of specific snapshots to expire. |
retainLast |
Number of ancestor snapshots to preserve regardless of olderThanMillis . (Defaults to 1) |
max_concurrent_deletes |
Size of the thread pool used for delete file actions. (By default, no thread pool is used.) |
streamDeleteResults |
By default, all files to delete are brought to the driver at once, which may cause memory issues with large file lists. Set to true to use toLocalIterator . |
Examples¶
olderThanMillis
¶
Info
expireOlderThan
is only available in Flare 6. Use olderThanMillis
in place of expireOlderThan
in Flare 5.0 (image tag: 7.3.21
).
The olderThanMillis
attribute specifies a cutoff timestamp in unix format. Snapshots created before this timestamp are considered expired and will be deleted, along with their associated metadata and manifest files, if no longer referenced. This helps manage storage and keep the table metadata clean by removing historical data no longer needed for rollback or time travel.
name: wf-expire-snapshots # Name of the Workflow
version: v1 # Version
type: workflow # Type of Resource (Here its workflow)
tags: # Tags
- expire
workflow: # Workflow Section
title: expire snapshots # Title of the DAG
dag: # Directed Acyclic Graph (DAG)
- name: expire # Name of the Job
title: expire snapshots # Title of the Job
spec: # Specs
tags: # Tags
- Expire
stack: flare:6.0 # Stack is Flare (so its a Flare Job)
compute: runnable-default # Compute
stackSpec: # Flare Stack specific Section
job: # Job Section
explain: true # Explain
logLevel: INFO # Loglevel
inputs: # Inputs Section
- name: inputDf # Input Dataset Name
dataset: dataos://lakehouse:retail/pos_store_product_cust?acl=rw # Input UDL
format: Iceberg # Format
actions: # Action Section
- name: expire_snapshots # Name of Flare Action
input: inputDf # Input Dataset Name
options: # Options
olderThanMillis: "1741987433222" # Timestamp in Unix Format (All snapshots older than the timestamp are expired)
olderThanMillis
with retainLast
¶
Remove snapshots older than specific day and time, but retain the last 5 snapshots:
name: wf-expire-snapshots # Name of the Workflow
version: v1 # Version
type: workflow # Type of Resource (Here its workflow)
tags: # Tags
- expire
workflow: # Workflow Section
title: expire snapshots # Title of the DAG
dag: # Directed Acyclic Graph (DAG)
- name: expire # Name of the Job
title: expire snapshots # Title of the Job
spec: # Specs
tags: # Tags
- Expire
stack: flare:6.0 # Stack is Flare (so its a Flare Job)
compute: runnable-default # Compute
stackSpec: # Flare Stack specific Section
job: # Job Section
explain: true # Explain
logLevel: INFO # Loglevel
inputs: # Inputs Section
- name: inputDf # Input Dataset Name
dataset: dataos://lakehouse:retail/pos_store_product_cust?acl=rw # Input UDL
format: Iceberg # Format
actions: # Action Section
- name: expire_snapshots # Name of Flare Action
input: inputDf # Input Dataset Name
options: # Options
olderThanMillis: "1741987433222" # Timestamp in Unix Format (All snapshots older than the timestamp are expired)
retainLast: 5 # Retain the last 5 snapshots
snapshotIds
¶
Remove snapshots with snapshot ID 12345679 (note that this snapshot ID should not be the current snapshot):
name: wf-expire-snapshots-10 # Name of the Workflow
version: v1 # Version
type: workflow # Type of Resource (Here its workflow)
tags: # Tags
- expire
workflow: # Workflow Section
title: expire snapshots # Title of the DAG
dag: # Directed Acyclic Graph (DAG)
- name: expire # Name of the Job
title: expire snapshots # Title of the Job
spec: # Specs
tags: # Tags
- Expire
stack: flare:6.0 # Stack is Flare (so its a Flare Job)
compute: runnable-default # Compute
stackSpec: # Flare Stack specific Section
job: # Job Section
explain: true # Explain
logLevel: INFO # Loglevel
inputs: # Inputs Section
- name: inputDf # Input Dataset Name
dataset: dataos://lakehouse:sandbox3/test_pyflare2?acl=rw # Input UDL
format: Iceberg # Format
actions: # Action Section
- name: expire_snapshots # Name of Flare Action
input: inputDf # Input Dataset Name # mandatory
options: # Options # mandatory
snapshotIds: # Snapshots to delete by ID
- "12345679" # Snapshot with given snapshot ID will be deleted
You can also provide multiple snapshot Ids:
actions: # Action Section
- name: expire_snapshots # Name of Flare Action
input: inputDf # Input Dataset Name (mandatory)
options: # Options (mandatory)
snapshotIds: # Snapshots to delete by ID
- "1234567912" # Snapshot with given snapshot ID will be deleted
- "1122344342" # Snapshot with given snapshot ID will be deleted
olderThanTimestamp¶
name: expire-04 # Name of the Workflow
version: v1 # Version
type: workflow # Type of Resource (Here its workflow)
tags: # Tags
- expire
workflow: # Workflow Section
title: expire snapshots # Title of the DAG
dag: # Directed Acyclic Graph (DAG)
- name: expire # Name of the Job
title: expire snapshots # Title of the Job
spec: # Specs
tags: # Tags
- Expire
stack: flare:6.0 # Stack is Flare (so its a Flare Job)
compute: runnable-default # Compute
stackSpec: # Flare Stack specific Section
job: # Job Section
explain: true # Explain
logLevel: INFO # Loglevel
inputs: # Inputs Section
- name: inputDf # Input Dataset Name
dataset: dataos://lakehouse:retail/pos_store_product_cust?acl=rw # Input UDL
format: Iceberg # Format
actions: # Action Section
- name: expire_snapshots # Name of Flare Action
input: inputDf # Input Dataset Name
options: # Options
olderThanTimestamp: '2025-04-19 00:00:00.000' # Timestamp (All snapshots older than the timestamp are expired)
retainLast: 2 # Retain the last 2 snapshots