Troubleshooting in DataOS PyFlareΒΆ
This section outlines common errors encountered while using the DataOS PyFlare SDK and provides actionable steps to diagnose and resolve these issues effectively.
FQDN Resolution FailureΒΆ
Error Message:
ConnectionError: HTTPSConnectionPool(host='example-training.dataos.app', port=443): Max retries exceeded with url...
Cause:
This error indicates that the 'Fully Qualified Domain Name (FQDN)' is incorrect or cannot be resolved by the system.
Solution:
Ensure the DATAOS_FQDN
is set to the correct domain.
Confirm that the hostname is valid and resolvable from your network.
Unauthorized Access to Depot MetadataΒΆ
Error Message:
Cause:
An invalid or missing API key or token is being used.
Solution:
Verify and provide the correct API key or token with appropriate permissions.
The Api key token
can be obtained by executing the following command on the CLI:
dataos-ctl user apikey get
# Expected Output
INFO[0000] π user apikey get...
INFO[0000] π user apikey get...complete
TOKEN β TYPE β EXPIRATION β NAME
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββΌβββββββββΌββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββββββββββ
dG9rZW5faG9wZWZ1bGx5X2xvdWRsedHJpa2luZ19uZXd0LmFiMzAyMTdjLTExYzAtNDg2Yi1iZjEyLWJkMjY1ZWM2YzgwOA== β apikey β 2025-04-13T05:30:00+05:30 β token_hopefully_loudly_striking_newt
dG9rZW5fdGlnaHRseV9uZWVkbGVzcX2xpYmVyYWxfcGFuZ29saW4uNTY0ZDc4ZTQtNWNhMy00YjI1LWFkNWMtYmFlMTcwYTM5MWU1 β apikey β 2025-04-11T05:30:00+05:30 β token_tightly_needlessly_liberal_pangolin
create
command as shown below:
Depot Not Loaded or Invalid DatasetΒΆ
Error Message:
PyflareReadException: Check if dataset dataos://lakehouse:sandbox3/test_pyflare2 exists and you have read access...
InvalidInputException: Depot not loaded in current session
Cause:
The required depot is not included in the Spark session, or the dataset does not exist.
Solution:
Ensure that the depot is correctly configured in the session:
spark = session_builder.SparkSessionBuilder(log_level="INFO") \
.with_spark_conf(sparkConf) \
.with_user_apikey(token) \
.with_dataos_fqdn(DATAOS_FQDN) \
.with_depot(depot_name="icebase", acl="rw") \
.build_session()
load(name="dataos://lakehouse:sandbox3/test_pyflare2", format="iceberg").show()
If using multiple depots, include all of them in the session:
spark = session_builder.SparkSessionBuilder(log_level="INFO") \
.with_spark_conf(sparkConf) \
.with_user_apikey(token) \
.with_dataos_fqdn(DATAOS_FQDN) \
.with_depot(depot_name="sfdepot01", acl="rw") \
.with_depot(depot_name="bigquerydepot", acl="rw") \
.with_depot(depot_name="lakehouse", acl="rw") \
.build_session()
Incorrect Dataset FormatΒΆ
Code Example:
Error Message:
Cause:
The format specified does not match the actual format of the dataset. In this case, the dataset is stored in Iceberg format but snowflake
is specified incorrectly.
Solution:
Use the correct format based on the dataset's storage type: