Skip to content

Troubleshooting in DataOS PyFlareΒΆ

This section outlines common errors encountered while using the DataOS PyFlare SDK and provides actionable steps to diagnose and resolve these issues effectively.

FQDN Resolution FailureΒΆ

Error Message:

ConnectionError: HTTPSConnectionPool(host='example-training.dataos.app', port=443): Max retries exceeded with url...

Cause:
This error indicates that the 'Fully Qualified Domain Name (FQDN)' is incorrect or cannot be resolved by the system.

Solution:
Ensure the DATAOS_FQDN is set to the correct domain.

DATAOS_FQDN = "example-training.dataos.app"

Confirm that the hostname is valid and resolvable from your network.

Unauthorized Access to Depot MetadataΒΆ

Error Message:

HTTPError: Something went wrong in depot metadata API for depot: lakehouse with status code: 401

Cause:
An invalid or missing API key or token is being used.

Solution:
Verify and provide the correct API key or token with appropriate permissions. The Api key token can be obtained by executing the following command on the CLI:

dataos-ctl user apikey get

# Expected Output
INFO[0000] πŸ”‘ user apikey get...                         
INFO[0000] πŸ”‘ user apikey get...complete                 

                                                   TOKEN                                                   β”‚  TYPE  β”‚        EXPIRATION         β”‚                   NAME                     
───────────────────────────────────────────────────────────────────────────────────────────────────────────┼────────┼───────────────────────────┼────────────────────────────────────────────
  dG9rZW5faG9wZWZ1bGx5X2xvdWRsedHJpa2luZ19uZXd0LmFiMzAyMTdjLTExYzAtNDg2Yi1iZjEyLWJkMjY1ZWM2YzgwOA==     β”‚ apikey β”‚ 2025-04-13T05:30:00+05:30 β”‚ token_hopefully_loudly_striking_newt       
  dG9rZW5fdGlnaHRseV9uZWVkbGVzcX2xpYmVyYWxfcGFuZ29saW4uNTY0ZDc4ZTQtNWNhMy00YjI1LWFkNWMtYmFlMTcwYTM5MWU1 β”‚ apikey β”‚ 2025-04-11T05:30:00+05:30 β”‚ token_tightly_needlessly_liberal_pangolin  
If there are no apikey's present, create a new one by using the create command as shown below:

dataos-ctl user apikey create

Depot Not Loaded or Invalid DatasetΒΆ

Error Message:

PyflareReadException: Check if dataset dataos://lakehouse:sandbox3/test_pyflare2 exists and you have read access...
InvalidInputException: Depot not loaded in current session

Cause:
The required depot is not included in the Spark session, or the dataset does not exist.

Solution:
Ensure that the depot is correctly configured in the session:

spark = session_builder.SparkSessionBuilder(log_level="INFO") \
    .with_spark_conf(sparkConf) \
    .with_user_apikey(token) \
    .with_dataos_fqdn(DATAOS_FQDN) \
    .with_depot(depot_name="icebase", acl="rw") \
    .build_session()

load(name="dataos://lakehouse:sandbox3/test_pyflare2", format="iceberg").show()

If using multiple depots, include all of them in the session:

spark = session_builder.SparkSessionBuilder(log_level="INFO") \
    .with_spark_conf(sparkConf) \
    .with_user_apikey(token) \
    .with_dataos_fqdn(DATAOS_FQDN) \
    .with_depot(depot_name="sfdepot01", acl="rw") \
    .with_depot(depot_name="bigquerydepot", acl="rw") \
    .with_depot(depot_name="lakehouse", acl="rw") \
    .build_session()

Incorrect Dataset FormatΒΆ

Code Example:

load(name="dataos://lakehouse:sandbox3/test_pyflare2", format="snowflake").show()

Error Message:

IllegalArgumentException: A snowflake URL must be provided with 'sfurl' parameter...

Cause:
The format specified does not match the actual format of the dataset. In this case, the dataset is stored in Iceberg format but snowflake is specified incorrectly.

Solution:
Use the correct format based on the dataset's storage type:

load(name="dataos://lakehouse:sandbox3/test_pyflare2", format="iceberg").show()
Was this page helpful?