Case Scenario: Partitioning¶
Single Partitioning¶
The partitioning in any iceberg table is column based. Currently, Flare currently supports only these Partition Transforms identity, year, month, day, and hour.
Multiple Partitioning¶
Partitioning can be done on multiple levels. For e.g, a user wants to partition the city data into two partitions, first based on state_code and second based on month. The command will be as follows:
dataos-ctl dataset -a dataos://lakehouse:retail/city \
-p "identity:state_code" \
-p "month:ts_city:month_partition"
Partition Updation¶
For updating partition, use the below command.
Command
dataos-ctl dataset -a ${{udl}} update-partition \
-p "${{partition_type}}:${{column_name}}:${{partition_name}}"
Example
Letβs say we wanna update the partition of city data along the month using the timestamp in the ts_city column, the code will be as follows -
dataos-ctl dataset -a dataos://lakehouse:retail/city update-partition \
-p "month:ts_city:month_partition"
Output