Schema Configurations¶
This section demonstrates how to supply schema when reading data using Flare stack from a Depot. There are different schema options available, including AVRO Schema, SparkJson Schema, and SparkDDL Schema.
Input Options for Schema¶
To specify the schema for your data in Flare, you can use the following YAML configuration:
inputs:
- name: city_connect
dataset: dataos://thirdparty01:none/city
format: csv
schemaType:
schemaPath:
schemaString:
If you don't define the schemaType
field but provide a schemaPath
, Flare will consider it as an AVRO schema.
AVRO Schema¶
To apply an AVRO schema, you have three options:
- Without specifying the schema type, you can provide the
schemaPath
field, and Flare will automatically consider it as an AVRO schema.
inputs:
- name: city_connect
dataset: dataos://thirdparty01:none/city
format: csv
schemaPath: dataos://thirdparty01:none/schemas/avsc/city.avsc
- Create a
.avsc
file, upload it to a location, and use it with a Depot.
inputs:
- name: city_connect
dataset: dataos://thirdparty01:none/city
format: csv
schemaType: AVRO
schemaPath: dataos://thirdparty01:none/schemas/avsc/city.avsc
- You can also provide the AVRO schema directly as a schema string:
inputs:
- name: city_connect
dataset: dataos://thirdparty01:none/city
format: csv
schemaType: AVRO
schemaString: '{"type":"record","name":"defaultName","namespace":"defaultNamespace","fields":[{"name":"city_id","type":["null","string"],"default":null},{"name":"zip_code","type":["null","int"],"default":null},{"name":"city_name","type":["null","string"],"default":null},{"name":"county_name","type":["null","string"],"default":null},{"name":"state_code","type":["null","string"],"default":null},{"name":"state_name","type":["null","string"],"default":null}]}'
SparkJson Schema¶
Similar to the AVRO schema, you can use SparkJson schema with the same combinations of schemaType
, schemaPath
, and schemaString
. The only difference lies in the format of the file containing the schema when using schemaPath
.
inputs:
- name: city_connect
dataset: dataos://thirdparty01:none/city
format: csv
schemaType: sparkJson
schemaPath: dataos://thirdparty01:none/schemas/avsc/city.json
schemaString: "schema json schema string"
SparkDDL Schema¶
For applying SparkDDL schema, use the following YAML configuration:
inputs:
- name: city_connect
dataset: dataos://thirdparty01:none/city
format: csv
schemaType: sparkJson
schemaPath: dataos://thirdparty01:none/schemas/avsc/city.ddl
schemaString: "city_id string, zip_code integer, city_name string, county_name string, state_code string, state_name string"
Make use of these schema options in Flare to effectively define the structure of your data and enhance your data development workflows.