Standalone 2.0¶
What’s new in 2.0?¶
- Apple M1/M2 Chip Support for Flare Standalone.
- Datasource IO to read and write.
- Added support to expose SparkUI to localhost.
- Support for custom image tag to run Standalone.
Contrasting Flare Standalone 1.0 and 2.0¶
Parameter | Standalone 1.0 | Standalone 2.0 |
---|---|---|
IO to Read/Write Data | Uses dataset | Uses Datasource |
Source Support | Works on Local File system-mounted data | Works on Object storage, Warehouses, relational and NoSQL data sources, as well as local file system-mounted data. To check out the list of support sources click here |
Support to expose SparkUI to localhost | No | Yes |
Support for configuring flare versions | Only supported the default version that came out of the box; no configuration permitted. | Flare stack version can be configured, e.g., flare:4.0, flare:5.0 |
Support for custom flare image hosted in docker registry | No custom image support | Supports all custom image tags to run Standalone |
Support for Apple M1/M2 Chip | The intermediate transformation required for M1/M2 support | Native M1/M2 chip support |
Features of Standalone 2.0¶
An effort towards making Flare independent from DataOS¶
By independence, we imply that Flare no longer has a dependency on DataOS. In Standalone 1.0, its functionality was limited to just mounting the local directory; the data either had to be downloaded from the object store or downloaded after running a query on Workbench. Now with Standalone 2.0, Flare can be used natively as any other application.
Simplifying the YAML declarations around Flare Jobs¶
Standalone 2.0 is more declarative; there is native support for data sources like file systems, object storage, databases, streaming sources, etc., whose credentials can be declared in sequential YAML. For this purpose, new properties like envs
which are environment variables specific for a source, inputType
, which contains the input file, and source-specific properties for the outputType
, have been introduced to make this declarative paradigm possible. Also, the sink
section within the sequence has been removed, and the properties have been added to the outputs
section.
Ability to support custom images and stack versions¶
Standalone 2.0 brings in the functionality of supporting custom stack images. Earlier, you had to utilize only the default stack images that came out of the box. But in the new release, you can utilize the -s
flag to configure the stack version and an optional -i
flag for custom images available in the docker registry. The command now looks like this
dataos-ctl develop start -s \
<stack-version> -i <custom-images> -f <manifest-file-path> \
-d <data-directory-path>
Sample Command
dataos-ctl develop start -s \
flare:3.0 -i rubiklabs/flare3:6.0.93 -f ./standalone/flare-config/read-local-write-gcs/config.yaml \
-d standalone
For Apple M1/M2 Chip Systems
With the custom image functionality, Standalone 2.0 adds cross-build support for Apple M1/M2 Chips. Use the rubiklabs/flare3:6.0.93-xp.01
image for the same.
dataos-ctl develop start -s \
flare:3.0 -i rubiklabs/flare3:6.0.93-xp.01 -f ./standalone/flare-config/read-local-write-gcs/config.yaml \
-d standalone
Shift from datasets to the data source IO¶
With Standalone 1.0 to read/write, you required datasets for IO. E.g., if you had to use data present in Pulsar, you first need to bring that data to the local system, mount it on docker and then do the processing on Standalone. But with Standalone 2.0, you can use data sources like Pulsar directly without bringing them locally, do the processing and get the data. The potential to utilize the environment storage now saves the extra effort of taking that data from stream or storage to local first and then mounting it as a local file. With the various data sources supported within Standalone 2.0, the entire development work can be confined to the local system, reducing the redundant noise created in the environment while formulating logic and queries for jobs in production.
Exposing SparkUI¶
Standalone 2.0 supports exposing SparkUI to localhost, which was absent in the earlier version. This adds to the ability to test your SQL queries locally before running your job in production, saving compute resources during optimization.
The new version also brings the -P
flag that allows the end user to expose SparkUI to a specific port, which is especially useful in scenarios when you have a job already running on the default port. The command is as follows