Skip to content

DataOS PyFlare

The DataOS PyFlare is a Python library that streamlines data operations and facilitate seamless interactions with Apache Spark within DataOS. It's a wrapper around Flare, to enable Python support with DataOS capabilities. The library abstracts complexities inherent in data flow, allowing users to direct their focus toward data transformations and the formulation of business logic by simplifying the loading, transformation, and storage of data. It facilitates the integration of existing Spark Job code bases with DataOS, requiring minimal modifications.

Key features

Streamlined Data Operations

PyFlare offers a unified interface for data loading, transformation, and storage, thereby significantly reducing development intricacies and accelerating the project timeline.

Data Connector Integration

It seamlessly integrates with various data connectors, including depot and non-depot sources, by leveraging the SDK's built-in capabilities.

Customizability and Extensibility

PyFlare empowers users with the flexibility to tailor it to their specific project requirements. It seamlessly integrates with existing Python libraries and frameworks designed for data manipulation.

Optimized for DataOS

PyFlare is finely tuned for the DataOS platform, rendering it an ideal choice for the management and processing of data within DataOS environments.

Installation

By default, the DataOS environment does not include support for the DataOS-native Jupyter Notebook. However, it can be integrated into the environment on a requirement basis. The PyFlare module is compatible with Jupyter Notebooks and can also be utilized in various Python programs across different environments.

Prerequisites

This section describes the steps to follow before installing PyFlare.

Ensure you have Python β‰₯ 3.7 Installed

Prior to installation, ensure that you have Python 3.7 and above installed on your system. You can check the Python version by running:

For Linux/macOS

python3 --version

For Windows

py --version

If you do not have Python, please install the latest version from python.org.

Note: If you’re using an enhanced shell like IPython or Jupyter notebook, you can run system commands by prefacing them with aΒ !Β character:

In [1]: import sys
        !{sys.executable} --version
# Output
Python 3.6.3

Ensure you have pip installed

Additionally, you’ll need to make sure you have pip available. You can check this by running:

For Linux/macOS

python3 -m pip --version

For Windows

py -m pip --version

If you installed Python from source, with an installer fromΒ python.org, or viaΒ HomebrewΒ you should already have pip. If you’re on Linux and installed using your OS package manager, you may have to install pip separately, seeΒ Installing pip/setuptools/wheel with Linux Package Managers.

Installing from PyPI

The dataos-pyflare library can be installed from the Python Package Index (PyPI) using the following command:

Recommendation

Install the dataos-pyflare==0.1.13 version of PyFlare, as it is the designated stable release.

For Linux/macOS

python3 -m pip install dataos-pyflare==0.1.13

For Windows

py -m pip install dataos-pyflare==0.1.13

Note: If you’re using an enhanced shell like IPython or Jupyter notebook, you must restart the runtime in order to use the newly installed package.

Install from Source Distribution

pipΒ can install from eitherΒ Source Distributions (sdist)Β orΒ Wheels, but if both are present on PyPI, pip will prefer a compatibleΒ wheel.

IfΒ pipΒ does not find a wheel to install, it will locally build a wheel and cache it for future installs, instead of rebuilding the source distribution in the future.

Supported sources

PyFlare library reference

For a comprehensive reference guide to PyFlare, including detailed information on its modules and classes, please consult the PyFlare Library Reference.