Scanner for Kafka¶
Kafka metadata Scanner Workflow can be configured and scheduled through the DataOS CLI. Ensure that all prerequisites are met before initiating the workflow.
Prerequisites¶
To scan the KAFKA depot, ensure the following prerequisites need the following:
-
Ensure that the depot is created and user have
read
access for the depot. Check the depot using Metis UI or use the following commands:#expected output INFO[0000] 🔍 get... INFO[0000] 🔍 get...complete | NAME | VERSION | TYPE | WORKSPACE | STATUS | RUNTIME | OWNER | | ---------------- | ------- | ----- | --------- | ------ | ------- | ---------- | | mongodepot | v2alpha | depot | | active | | usertest | | snowflakedepot | v2alpha | depot | | active | | gojo | | redshiftdepot | v2alpha | depot | | active | | kira | | mysqldepot | v2alpha | depot | | active | | ryuk | | oracle01 | v2alpha | depot | | active | | drdoom | | mariadb01 | v2alpha | depot | | active | | tonystark | | demopreppostgres | v2alpha | depot | | active | | slimshaddy | | demoprepbq | v2alpha | depot | | active | | pengvin | | mssql01 | v2alpha | depot | | active | | hulk | | kafka01 | v2alpha | depot | | active | | peeter | | icebase | v2alpha | depot | | active | | blackpink | | azuresql | v2alpha | depot | | active | | arnold | | fastbase | v2alpha | depot | | active | | ddevil |
If the KAFKA Depot is not created, create it using the below manifest sample:
name: ${{depot-name}} version: v1 type: depot tags: - ${{tag1}} owner: ${{owner-name}} layer: user depot: type: KAFKA description: ${{description}} external: ${{true}} spec: brokers: - ${{broker1}} - ${{broker2}} schemaRegistryUrl: ${{http://20.9.63.231:8081/}}
To connect to Kafka, user needs a KAFKA broker list. Once the user provides the broker list, the Depot enables fetching all the topics in the KAFKA cluster.
-
Access Permissions in DataOS: To execute a Scanner Workflow in DataOS, verify that at least one of the following role tags is assigned:
roles:id:data-dev
roles:id:system-dev
roles:id:user
Use the following command to check assigned roles:
If any required tags are missing, contact a DataOS Operator or submit a Grant Request for role assignment.
Alternatively, if access is managed through use cases, ensure the following use cases are assigned:
-
Read Workspace
-
Run as Scanner User
-
Manage All Depot
-
Read All Dataset
-
Read All Secrets from Heimdall
To validate assigned use cases, refer to the Bifrost Application Use Cases section.
Scanner Workflow for Kafka¶
Here is an example of manifest configuration to connect to the source and reach the Metis server to save the metadata in Metis DB
version: v1
name: kafka-scanner2
type: workflow
tags:
- kafka-scanner2.0
description: The job scans schema tables and register metadata
workflow:
dag:
- name: scanner2-kafka
description: The job scans schema from kafka depot tables and register metadata to metis2
spec:
stack: scanner:2.0
compute: runnable-default
runAsUser: metis
stackSpec:
depot: kafka01
# sourceConfig:
# config:
# topicFilterPattern:
# includes:
# - Sanity
# - sampel_Json_Kafka
# - consumer_offsets
The above sample manifest file is deployed using the following command:
Updating the Scanner Workflow:
If the Depot or Scanner configurations are updated, the Scanner must be redeployed after deleting the previous instance. Use the following command to delete the existing Scanner:
OR
Best Practice
As part of best practices, it is recommended to regularly delete Resources that are no longer in use. This practice offers several benefits, including saving time and reducing costs.