Skip to content

Flare Job Case Scenarios in DataOS

Batch Jobs

Batch jobs recompute all affected datasets during each run, ensuring full refresh and deterministic outcomes. They typically involve reading data from source depots, applying transformations, and writing to target depots.

For example Workflows, see the batch job case scenario.


Stream Jobs

Stream jobs enable near real-time processing by ingesting data in continuous micro-batches. These jobs are suitable for time-sensitive use cases such as event tracking, system monitoring, and IoT data analysis.

Detailed configuration is available in the streaming job case scenario.


Incremental Jobs

Incremental jobs process only the rows or files that have changed since the last execution. This reduces compute cost and latency, making them ideal for frequently updated datasets.

Learn more in the incremental job case scenario.


Data Transformation Use Cases

Flare supports several advanced data transformation patterns:


Job Performance and Optimization

Flare jobs can be tuned to enhance execution efficiency and reduce resource usage. Techniques include optimizing transformation logic, adjusting compute configurations, and minimizing I/O.

Refer to job optimization by tuning for implementation guidance.


Metadata and Data Management

DataOS supports metadata management to improve discoverability, governance, and reusability:


Iceberg Table Optimization

Efficient handling of data and metadata in Iceberg tables is critical to maintaining performance at scale. Over time, small files and redundant metadata can degrade query efficiency.

Compaction

Partitioning

  • Improve query efficiency through structured data organization with partitioning
  • It also helps in improving query efficiency through schema evolution via partition evolution.

Bucketing and Caching

  • Enhance join and aggregation performance with bucketing.
  • Speed up repeated reads using caching.

Data Lifecycle and Maintenance

Maintaining Iceberg datasets involves regular cleanup and space optimization tasks. These actions are supported in DataOS-managed depots (Lakehouse only):