Remove Orphans¶

The remove_orphans action cleans up orphans files older than a specified time period. This action may take a long time to finish if you have lots of files in data and metadata directories. It is recommended to execute this periodically, but you may not need to execute this often.

Get the list of snapshots by writing the following command

dataos-ctl dataset snapshots -a dataos://icebase:retail/cit

Expected output

      SNAPSHOTID      │   TIMESTAMP   │    DATE AND TIME (GMT)     
──────────────────────┼───────────────┼────────────────────────────
  7002479430618666161 │ 1740643647492 │ 2025-02-27T08:07:27+00:00  
  2926095925031493170 │ 1740737372219 │ 2025-02-28T10:09:32+00:00

Syntax for Flare Version `flare:6.0`¶

version: v1 
name: orphans 
type: workflow 
tags: 
  - orphans
workflow: 
  title: Remove orphan files 
  dag: 
    - name: orphans 
      title: Remove orphan files 
      spec: 
        tags: 
          - orphans
        stack: flare:6.0 
        compute: runnable-default 
        stackSpec: 
          job: 
            explain: true 
            logLevel: INFO 
            inputs: 
              - name: inputDf 
                dataset: dataos://icebase:retail/city 
                format: Iceberg 
            actions: # Flare Action
              - name: remove_orphans # Action Name
                input: inputDf # Input Dataset Name
                options: # Options
                  olderThan: "1739734172" # Timestamp in Unix Format

Remove Orphans¶

Syntax for Flare Version flare:6.0¶

Syntax for Flare Version `flare:6.0`¶