pyflare.sdk package
Subpackages
- pyflare.sdk.config package
- Submodules
- pyflare.sdk.config.constants module
- pyflare.sdk.config.read_config module
ReadConfig
ReadConfig.cluster_name
ReadConfig.collection()
ReadConfig.connection()
ReadConfig.dataset_absolute_path()
ReadConfig.dataset_name()
ReadConfig.depot_absolute_path()
ReadConfig.depot_details
ReadConfig.depot_name()
ReadConfig.depot_type()
ReadConfig.driver
ReadConfig.extra_options
ReadConfig.format_resolver()
ReadConfig.io_format
ReadConfig.is_stream
ReadConfig.query
ReadConfig.spark_options
- pyflare.sdk.config.write_config module
WriteConfig
WriteConfig.collection()
WriteConfig.connection()
WriteConfig.dataset_absolute_path()
WriteConfig.dataset_name()
WriteConfig.depot_absolute_path()
WriteConfig.depot_details
WriteConfig.depot_name()
WriteConfig.depot_type()
WriteConfig.driver
WriteConfig.extra_options
WriteConfig.format_resolver()
WriteConfig.io_format
WriteConfig.is_stream
WriteConfig.mode
WriteConfig.spark_options
- Module contents
- pyflare.sdk.core package
- Submodules
- pyflare.sdk.core.dataos_input module
- pyflare.sdk.core.dataos_output module
- pyflare.sdk.core.decorator module
- pyflare.sdk.core.minerva_input module
- pyflare.sdk.core.session_builder module
SparkSessionBuilder
SparkSessionBuilder.add_reader_instance()
SparkSessionBuilder.add_writer_instance()
SparkSessionBuilder.api_token
SparkSessionBuilder.build_session()
SparkSessionBuilder.dataos_fqdn
SparkSessionBuilder.load_default_spark_conf()
SparkSessionBuilder.log_level
SparkSessionBuilder.logger
SparkSessionBuilder.parsed_inputs
SparkSessionBuilder.parsed_outputs
SparkSessionBuilder.spark
SparkSessionBuilder.spark_conf
SparkSessionBuilder.with_dataos_fqdn()
SparkSessionBuilder.with_depot()
SparkSessionBuilder.with_readers()
SparkSessionBuilder.with_spark_conf()
SparkSessionBuilder.with_user_apikey()
SparkSessionBuilder.with_writers()
g_dataos_token
g_inputs
g_outputs
load()
minerva_input()
refresh_global_data()
save()
spark
- Module contents
- pyflare.sdk.depots package
- pyflare.sdk.readers package
- Submodules
- pyflare.sdk.readers.bigquery_reader module
- pyflare.sdk.readers.delta_reader module
- pyflare.sdk.readers.elasticsearch_reader module
- pyflare.sdk.readers.fastbase_reader module
- pyflare.sdk.readers.file_reader module
- pyflare.sdk.readers.iceberg_reader module
- pyflare.sdk.readers.jdbc_reader module
- pyflare.sdk.readers.minerva_reader module
- pyflare.sdk.readers.reader module
- pyflare.sdk.readers.snowflake_reader module
- Module contents
- pyflare.sdk.utils package
- Submodules
- pyflare.sdk.utils.generic_utils module
append_properties()
authorize_user()
decode_base64_string()
decorate_logger()
encode_base64_string()
enhance_connection_url()
get_abfss_spark_conf()
get_dataset_path()
get_env_variable()
get_gcs_spark_conf()
get_s3_spark_conf()
get_secret_file_path()
get_secret_token()
resolve_dataos_address()
safe_assignment()
write_dict_to_file()
write_string_to_file()
- pyflare.sdk.utils.pyflare_exceptions module
- pyflare.sdk.utils.pyflare_logger module
- Module contents
- pyflare.sdk.writers package
- Submodules
- pyflare.sdk.writers.bigquery_writer module
- pyflare.sdk.writers.delta_writer module
- pyflare.sdk.writers.elasticsearch_writer module
- pyflare.sdk.writers.fastbase_writer module
- pyflare.sdk.writers.file_writer module
- pyflare.sdk.writers.iceberg_writer module
- pyflare.sdk.writers.jdbc_writer module
- pyflare.sdk.writers.snowflake_writer module
- pyflare.sdk.writers.writer module
- Module contents
Module contents
- pyflare.sdk.load(name, format, driver=None, query=None, options=None)[source]
Read a dataset from the source.
- Parameters:
name (str) – Depot address of the source.
format (str) – Read format.
driver (str) – Driver needed to read from the source (optional).
query (str) – Query to execute (optional).
options (dict) – Additional Spark and source properties (optional).
- Returns:
A Spark DataFrame with governed data.
- Return type:
pyspark.sql.DataFrame
- Raises:
PyflareReadException – If the dataset does not exist or read access fails.
Examples
Iceberg:
read_options = { 'compression': 'gzip', 'iceberg': { 'table_properties': { 'read.split.target-size': 134217728, 'read.split.metadata-target-size': 33554432 } } } load(name="dataos://lakehouse:retail/city", format="iceberg", options=read_options)
JDBC:
read_options = { 'compression': 'gzip', 'partitionColumn': 'last_update', 'lowerBound': datetime.datetime(2008, 1, 1), 'upperBound': datetime.datetime(2009, 1, 1), 'numPartitions': 6 } load(name="dataos://sanitypostgres:public/city", format="postgresql", driver="com.mysql.cj.jdbc.Driver", options=read_options)
- pyflare.sdk.minerva_input(name, query, cluster_name='system', driver='io.trino.jdbc.TrinoDriver', options=None)[source]
- pyflare.sdk.save(name: str, dataframe, format: Optional[str] = None, mode='append', driver=None, options=None)[source]
Write the transformed dataset to the output sink.
- Parameters:
name (str) – Output key to write.
dataframe (pyspark.sql.DataFrame) – The DataFrame to write.
format (str) – Output format (e.g., iceberg, parquet).
mode (str) – Write mode (default is “append”).
driver (str) – Driver to use for the sink (optional).
options (dict) – Additional write configuration (optional).
- Raises:
PyflareWriteException – If dataset does not exist or write access fails.
Example
write_options = { "compression": "gzip", "iceberg": { "table_properties": { "write.format.default": "parquet", "write.parquet.compression-codec": "gzip", "write.metadata.previous-versions-max": 3, "parquet.page.write-checksum.enabled": "false" }, "partition": [ {"type": "months", "column": "ts_city"}, {"type": "bucket", "column": "city_id", "bucket_count": 8}, {"type": "identity", "column": "city_name"} ] } } save(name="dataos://lakehouse:sdk/city", format="iceberg", mode="append", options=write_options)