MongoDB¶

The Nilus connector for MongoDB supports Change Data Capture (CDC), enabling near real-time replication of data changes from MongoDB to Supported Destinations, such as the Lakehouse. CDC captures change events from MongoDB’s oplog.rs and streams them continuously.

Info

Batch data movement is not supported for MongoDB.

Prerequisites¶

Before enabling CDC, ensure the following configurations depending on your hosting environment:

MongoDB Replica Set¶

MongoDB must run as a replica set, even for single-node deployments.
Nilus CDC for MongoDB relies on the oplog.rs collection, which is only available in replica sets.

Info

Contact the Database Administrator (DBA) to set up and enable Change Data Capture (CDC) in MongoDB.

Enable `oplog` Access¶

Nilus uses MongoDB’s oplog.rs to capture changes.

Nilus requires a user with read access to business data and internal system databases to access the oplog. If the user is not created, create a user in MongoDB using the following:

db.createUser({
  user: "debezium",
  pwd: "dbz",
  roles: [
    { role: "read", db: "your_app_db" },      // Read target database
    { role: "read", db: "local" },            // Read oplog
    { role: "read", db: "config" },           // Read cluster configuration
    { role: "readAnyDatabase", db: "admin" }, // Optional: discovery
    { role: "clusterMonitor", db: "admin" }   // Recommended: monitoring
  ]
})

Info

Grant only the roles required for your environment to follow the principle of least privilege.

Pre-created MongoDB Depot¶

A Depot must exist in DataOS with read-write access. To check the Depot, go to the Metis UI of the DataOS or use the following command:

dataos-ctl resource get -t depot -a

#Expected Output
NFO[0000] 🔍 get...                                     
INFO[0000] 🔍 get...complete 
|    NAME      | VERSION | TYPE  | STATUS | OWNER    |
| ------------ | ------- | ----- | ------ | -------- |
| mongodbdepot | v2alpha | depot | active | usertest |

If the Depot is not created use the following manifest configuration template to create the MongoDB Depot:

MongoDB Depot Manifest

name: ${{depot-name}}
version: v2alpha
type: depot
tags:
  - ${{tag1}}
  - ${{tag2}}
layer: user
depot:
  type: mongodb                                 
  description: ${{description}}
  compute: ${{runnable-default}}
  mongodb:                                          
    subprotocol: ${{"mongodb+srv"}}
    nodes: ${{["clusterabc.ezlggfy.mongodb.net"]}}
  external: ${{true}}
  secrets:
    - name: ${{instance-secret-name}}-r
      allkeys: ${{true}}

    - name: ${{instance-secret-name}}-rw
      allkeys: ${{true}}

Info

Update variables such as name, owner, compute, layer, etc., and contact the DataOS Administrator or Operator to obtain the appropriate secret name.

Sample Service Config¶

Following manifest configuration template can be use to apply the CDC for MongoDB:

name: ${{service-name}}                                    # Service identifier
version: v1                                                # Version of the service
type: service                                              # Defines the resource type
tags:                                                      # Classification tags
    - ${{tag}}                                              
    - ${{tag}}                                              
description: Nilus CDC Service for MongoDB description    # Description of the service
workspace: public                                          # Workspace where the service is deployed

service:                                                   # Service specification block
  servicePort: 9010                                        # Service port
  replicas: 1                                              # Number of replicas
  logLevel: INFO                                           # Logging level
  compute: ${{query-default}}                              # Compute type

  stack: ${{nilus:3.0}}                                    # Nilus stack version
  stackSpec:                                               # Stack specification
    source:                                                # Source configuration block
      address: dataos://mongodbdepot                       # Source depot address/UDL
      options:                                             # Source-specific options
        engine: debezium                                   # Required CDC engine; used for streaming changes
        collection.include.list: "retail.products"         # MongoDB collections to include
        topic.prefix: "cdc_changelog"                      # Required topic prefix for CDC stream
        max-table-nesting: "0"                             # Optional; prevents unnesting of nested documents
        transforms.unwrap.array.encoding: array            # Optional; preserves arrays in sink as-is
    sink:                                                  # Sink configuration for CDC output
      address: dataos://testinglh                          # Sink depot address
      options:                                             # Sink-specific options
        dest-table: mdb_test_001                           # Destination table name in the sink depot
        incremental-strategy: append                       # Append-only strategy for streaming writes

Info

Ensure that all placeholder values and required fields (e.g., connection addresses, slot names, and access credentials) are properly updated before applying the configuration to a DataOS workspace.

Deploy the manifest file using the following command:

dataos-ctl resource apply -f ${{path to the Nilus Service YAML}}

Info

The MongoDB host used in the CDC service YAML must match exactly the host defined during replica set initialization.

Source Options¶

Nilus supports the following source options for MongoDB CDC:

Option	Default	Description
`database.include.list`	No Default	An optional comma-separated list of regular expressions or literals that match fully-qualified namespaces for MongoDB collections to be monitored. By default, the connector monitors all collections except those in the `local` and `admin` databases. When `collection.include.list` is set, the connector monitors only the collections that the property specifies. Other collections are excluded from monitoring. Collection identifiers are of the form databaseName.collectionName.
`collection.include.list`	No Default	An optional comma-separated list of regular expressions or literals that match fully-qualified namespaces for MongoDB collections to be excluded from monitoring. When `collection.exclude.list` is set, the connector monitors every collection except the ones that the property specifies. Collection identifiers are of the form databaseName.collectionName.
`snapshot.mode`	`initial`	Specifies the behavior for snapshots when the connector starts. Options: `always:` The connector performs a snapshot every time that it starts. The snapshot includes the structure and data of the captured tables. Specify this value to populate topics with a complete representation of the data from the captured tables every time that the connector starts. After the snapshot completes, the connector begins to stream event records for subsequent database changes. `initial:` The connector performs a snapshot only when no offsets have been recorded for the logical server name. `no_data:` The connector performs an initial snapshot and then stops, without processing any subsequent changes. `initial_only:` The connector never performs snapshots. `when_needed:` After the connector starts, it performs a snapshot only if it detects one of the following circumstances: It cannot detect any topic offsets. A previously recorded offset specifies a log position that is not available on the server.
`field.exclude.list`	No Default	An optional comma-separated list of the fully-qualified names of fields that should be excluded from change event message values. Fully-qualified names for fields are of the form databaseName.collectionName.fieldName.nestedFieldName, where databaseName and collectionName may contain the wildcard (*) which matches any characters.
`topic.prefix`	No Default	Topic prefix that provides a namespace for the particular MongoDB instance or cluster in which Nilus is capturing changes. The prefix should be unique across all other connectors. Only alphanumeric characters, hyphens, dots and underscores must be used in the database server logical name. This is mandatory. This prefix is also appended to the sink table.
`transforms.unwrap.array.encoding`	No Default	It controls how array values are encoded when unwrapped by a Kafka Connect transform. Common options include "`none`" (default), "`array`", "`json`", or "`string`", which define how array elements are serialized into Kafka messages.
`max-table-nesting`	No Default	Specifies the maximum allowed depth for nested tables or objects (commonly in JSON or relational mapping). It helps prevent excessively deep or complex structures that can impact performance or compatibility.

Sink Options¶

Nilus supports the following sink options for MongoDB CDC:

Field	Description	Default
`dest-table`	Target table in the sink.	—
`incremental-strategy`	Write mode (`append` recommended for CDC).	`append`

Core Concepts¶

Nilus captures row-level changes from MongoDB using the replica set oplog. Below are the essential concepts for understanding how Nilus integrates with MongoDB.

Replica Set
- MongoDB must run as a replica set, even in single-node deployments.
- Nilus connects to the primary replica and tails the oplog (local.oplog.rs).
- Standalone MongoDB servers are not supported for CDC.
The MongoDB Oplog
- The oplog (oplog.rs) is a capped collection in the local database.
- It records every insert, update, and delete applied to the primary.
- Nilus reads this log to generate CDC events.
- oplog entries roll off in FIFO (first-in-first-out) order once the allocated size is exhausted.
Schema-Less Nature of MongoDB
- MongoDB is schema-less, but Nilus dynamically infers schemas.
- The sink table is created from the first document observed.
- Schema evolution is tracked using a Schema Registry with Avro.
oplog Retention & Disk Pressure

Nilus maintains a cursor in the oplog. If it lags:
- Older oplog entries may expire.
- Expiration causes event loss and forces a new snapshot.
Disk pressure:
- oplog grows continuously with write load.
- High disk usage can cause:
  - Write slowdowns
  - Replication failures
  - Node crashes
Recovery:
- If oplog retention is exceeded, Nilus enters a pending state.
- Restarting the connector is not enough — a redeploy with a new snapshot is required.

Error 286 (ChangeStreamHistoryLost)

Error 286 means Nilus attempted to resume from an oplog entry that no longer exists:\

Command failed with error 286 (ChangeStreamHistoryLost):
Resume of change stream was not possible, as the resume point may no longer be in the oplog

Why It Happens

Connector lag exceeds oplog window.
oplog was resized/shrunk.
Filesystem pressure caused truncation.
High write spikes shortened the retention window.

Recovery

Redeploy the CDC service with a new PVC directory.
OR delete offsets so Nilus re-snapshots.
OR change connector name to force a fresh snapshot (from Nilus v0.0.13+).

Prevention

Size the oplog for the worst-case lag:

RequiredSize(MB) ≈ PeakWrites/sec × MaxLag(sec) × avgEntrySize × safetyFactor

Monitor with rs.printReplicationInfo().
Avoid long pauses beyond the oplog retention window.
Use replSetResizeOplog (MongoDB 4.4+) or minRetentionHours (MongoDB 6.0+) for stronger guarantees.

Info

Restarting the service alone does not fix Error 286. Manual intervention is required.

oplog Polling

Nilus continuously tails the oplog:

- Use a cursor to track the last processed entry.
- Parses each entry and emits structured CDC events.
- Keeps streaming aligned with replication order.

MongoDB System Databases & Access

Nilus requires specific database access:
- local
  - Source of oplog (local.oplog.rs).
  - Requires read permissions. - admin
  - Used for server metadata, discovery, and auth.
  - Requires read on commands like replSetGetStatus, buildInfo, listDatabases.
- config: Needed only in sharded clusters.
Target Databases (Application Data)
- Collections you want to capture.
- Requires read permissions.
- If snapshotting is enabled, Nilus reads all documents during startup.
Recommended Source Options(Sample Configuration):

source:
  address: dataos://mongodept
  options:
    engine: debezium
    collection.include.list: "shop.products"
    topic.prefix: "cdc_changelog"
    snapshot.mode: "when_needed"
    max.batch.size: 250
    max.queue.size: 2000
    max.queue.size.in.bytes: "134217728"
    heartbeat.interval.ms: 6000
    offset.flush.interval.ms: 15000

sink:
  address: dataos://testawslh
  options:
    dest-table: mongodb_test
    incremental-strategy: append

Option Reference:

Property	Purpose	Suggested Value
`snapshot.mode`	Controls behavior if offsets are missing	`when_needed` (default), `initial`, or `always`
`offset.flush.interval.ms`	Frequency of committing offsets	15000 ms
`heartbeat.interval.ms`	Emit heartbeat events when idle	5000–10000 ms
`max.batch.size`	Max records in a batch	250
`max.queue.size`	Max records in memory	2000
`max.queue.size.in.bytes`	Max memory buffer size	128 MB (adjustable)

Operational Playbook

Phase	Checklist
Daily	Monitor oplog window & connector lag. Alert if lag > 80 % of window.
Before Maintenance	Estimate pause time. If > oplog window, temporarily resize oplog.
After Outage	If Error 286 occurs, redeploy with fresh snapshot or clean offsets.
After Recovery	Validate sizing assumptions, adjust oplog size or Nilus throughput configs.

Useful Commands for `oplog`¶

// oplog history window
rs.printReplicationInfo();

// Latest oplog record
use local;
db.oplog.rs.find().sort({$natural:-1}).limit(1).pretty();

// Resize oplog (primary only)
use admin;
db.adminCommand({replSetResizeOplog:1, size: <MB>, minRetentionHours: <hours>});

MongoDB¶

Prerequisites¶

MongoDB Replica Set¶

Enable oplog Access¶

Pre-created MongoDB Depot¶

Sample Service Config¶

Source Options¶

Sink Options¶

Core Concepts¶

Useful Commands for oplog¶

Enable `oplog` Access¶

Useful Commands for `oplog`¶