Elasticsearch Depots¶

To start executing Flare Jobs on Elasticsearch Depots, you first need to set up an Elasticsearch Depot. If you haven’t done it, navigate to the following link: Elasticsearch Depot.

Read Configuration¶

For reading the data from an Elasticsearch depot, we need to configure the following property name, dataset, format, and Elasticsearch-specific property es.nodes.wan.only within options in the inputs section of the YAML. For instance, if your dataset name is input, the UDL address is dataos://elasticsearch:default/elastic_write and the format set to elasticsearch. Then the inputs section will be as follows-

inputs:
   - name: input
     dataset:  dataos://elasticsearch:default/elastic_write
     format: elasticsearch
     options:
        es.nodes.wan.only: 'true'

By setting es.nodes.wan.only, the connector will limit its network usage and instead of connecting directly to the target resource shards, it will make connections to the Elasticsearch cluster only.

Sample Read configuration manifest

Let’s take a case scenario where we read the dataset from Elasticsearch depot and store it in the Lakehouse within the DataOS. The read config manifest will be as follows:

Sample Read configuration manifest

elasticsearch_depot_read.yml

version: v1
name: elasticsearch-read
type: workflow
tags:
  - elasticsearch
  - read
  - flare
description: this jobs reads data from elasticsearch and writes to lakehouse
owner: ${owner-name}
workflow:
  dag:
    - name: elasticsearch-read
      title: read data from elasticsearch
      description: read data from elasticsearch
      spec:
        tags:
          - Connect
        stack: flare:7.0
        compute: runnable-default
        stackSpec:
          job:
            explain: true
            inputs:
              - name: input
                dataset: dataos://sanityelasticsearch:default/elasticsearch_write_csv_13
                format: elasticsearch
                options:
                  es.nodes.wan.only: 'true'
            logLevel: INFO
            outputs:
              - name: output02
                dataset: dataos://lakehouse:smoketest/elasticsearch_read_csv_13?acl=rw
                format: iceberg
                options:
                  saveMode: append
            steps:
              - sequence:
                  - name: output02
                    sql: SELECT * FROM input limit 10

Write Configuration¶

For writing the data to an Elasticsearch Depot, we need to configure the name, dataset, format, in the outputs section of the manifest. For instance, if your dataset name is output01, the dataset is to be stored at the location dataos://elasticsearch:default/elastic_write and the file format is elasticsearch. Then the inputs section will be as follows-

outputs:
  - name: output01
    dataset: dataos://elasticsearch:default/elastic_write?acl=rw
    format: elasticsearch

Sample Read configuration manifest

Let’s take a case scenario where we have to write the dataset to the an Elasticsearch Depot from the thirdparty Depot. The write config manifest will be as follows:

elasticsearch_depot_read.yml

version: v1
name: elasticsearch-write
type: workflow
tags:
  - elasticsearch
  - write
  - flare
description: this jobs reads data from thirdparty and writes to elasticsearch
workflow:
  title: Connect City
  dag:
    - name: elasticsearch-write
      title: write data to elasticsearch
      description: write data to elasticsearch
      spec:
        tags:
          - Connect
          - City
        stack: flare:7.0
        compute: runnable-default
        stackSpec:
          job:
            explain: true
            inputs:
              - name: city_connect
                dataset: dataos://thirdparty01:none/city
                format: csv
                schemaPath: dataos://thirdparty01:none/schemas/avsc/city.avsc
            logLevel: INFO
            outputs:
              - name: output01
                dataset: dataos://sanityelasticsearch:default/elasticsearch_write_csv_13?acl=rw
                format: elasticsearch
                options:
                  saveMode: append
                  extraOptions:
                    es.nodes.wan.only: 'true'
                isStream: false
            steps:
              - sequence:
                  - name: output01
                    sql: SELECT * FROM city_connect limit 10