MongoDB¶
The Nilus connector for MongoDB supports Change Data Capture (CDC), enabling near real-time replication of data changes from MongoDB to Supported Destinations, such as the Lakehouse. CDC captures change events from MongoDB’s oplog.rs
and streams them continuously.
Info
Batch data movement is not supported for MongoDB.
Prerequisites¶
Before enabling CDC, ensure the following configurations depending on your hosting environment:
MongoDB Replica Set¶
- MongoDB must run as a replica set, even for single-node deployments.
- Nilus CDC for MongoDB relies on the
oplog.rs
collection, which is only available in replica sets.
Info
Contact the Database Administrator (DBA) to set up and enable Change Data Capture (CDC) in MongoDB.
Enable oplog
Access¶
- Nilus uses MongoDB’s
oplog.rs
to capture changes. -
Nilus requires a user with
read
access to business data and internal system databases to access theoplog
. If the user is not created, create a user in MongoDB using the following:db.createUser({ user: "debezium", pwd: "dbz", roles: [ { role: "read", db: "your_app_db" }, // Read target database { role: "read", db: "local" }, // Read oplog { role: "read", db: "config" }, // Read cluster configuration { role: "readAnyDatabase", db: "admin" }, // Optional: discovery { role: "clusterMonitor", db: "admin" } // Recommended: monitoring ] })
Info
Grant only the roles required for your environment to follow the principle of least privilege.
Pre-created MongoDB Depot¶
A Depot must exist in DataOS with read-write access. To check the Depot, go to the Metis UI of the DataOS or use the following command:
dataos-ctl resource get -t depot -a
#Expected Output
NFO[0000] 🔍 get...
INFO[0000] 🔍 get...complete
| NAME | VERSION | TYPE | STATUS | OWNER |
| ------------ | ------- | ----- | ------ | -------- |
| mongodbdepot | v2alpha | depot | active | usertest |
If the Depot is not created use the following manifest configuration template to create the MongoDB Depot:
MongoDB Depot Manifest
name: ${{depot-name}}
version: v2alpha
type: depot
tags:
- ${{tag1}}
- ${{tag2}}
layer: user
depot:
type: mongodb
description: ${{description}}
compute: ${{runnable-default}}
mongodb:
subprotocol: ${{"mongodb+srv"}}
nodes: ${{["clusterabc.ezlggfy.mongodb.net"]}}
external: ${{true}}
secrets:
- name: ${{instance-secret-name}}-r
allkeys: ${{true}}
- name: ${{instance-secret-name}}-rw
allkeys: ${{true}}
Info
Update variables such as name
, owner
, compute
, layer
, etc., and contact the DataOS Administrator or Operator to obtain the appropriate secret name.
Sample Service Config¶
Following manifest configuration template can be use to apply the CDC for MongoDB:
name: ${{service-name}} # Service identifier
version: v1 # Version of the service
type: service # Defines the resource type
tags: # Classification tags
- ${{tag}}
- ${{tag}}
description: Nilus CDC Service for MongoDB description # Description of the service
workspace: public # Workspace where the service is deployed
service: # Service specification block
servicePort: 9010 # Service port
replicas: 1 # Number of replicas
logLevel: INFO # Logging level
compute: ${{query-default}} # Compute profile
persistentVolume: # Persistent volume configuration
name: ${{ncdc-vo1-01}} # Volume name (multiple options commented)
directory: ${{nilus_01}} # Target directory within the volume
stack: ${{nilus:3.0}} # Nilus stack version
stackSpec: # Stack specification
source: # Source configuration block
address: dataos://mongodbdepot # Source depot address/UDL
options: # Source-specific options
engine: debezium # Required CDC engine; used for streaming changes
collection.include.list: "retail.products" # MongoDB collections to include
topic.prefix: "cdc_changelog" # Required topic prefix for CDC stream
max-table-nesting: "0" # Optional; prevents unnesting of nested documents
transforms.unwrap.array.encoding: array # Optional; preserves arrays in sink as-is
sink: # Sink configuration for CDC output
address: dataos://testinglh # Sink depot address
options: # Sink-specific options
dest-table: mdb_test_001 # Destination table name in the sink depot
incremental-strategy: append # Append-only strategy for streaming writes
Info
Ensure that all placeholder values and required fields (e.g., connection addresses, slot names, and access credentials) are properly updated before applying the configuration to a DataOS workspace.
Deploy the manifest file using the following command:
Info
The MongoDB host used in the CDC service YAML must match exactly the host defined during replica set initialization.
Source Options¶
Nilus supports the following source options for MongoDB CDC:
Option | Default | Description |
---|---|---|
database.include.list |
No Default | An optional comma-separated list of regular expressions or literals that match fully-qualified namespaces for MongoDB collections to be monitored. By default, the connector monitors all collections except those in the local and admin databases. When collection.include.list is set, the connector monitors only the collections that the property specifies. Other collections are excluded from monitoring. Collection identifiers are of the form databaseName.collectionName. |
collection.include.list |
No Default | An optional comma-separated list of regular expressions or literals that match fully-qualified namespaces for MongoDB collections to be excluded from monitoring. When collection.exclude.list is set, the connector monitors every collection except the ones that the property specifies. Collection identifiers are of the form databaseName.collectionName. |
snapshot.mode |
initial |
Specifies the behavior for snapshots when the connector starts.
|
field.exclude.list |
No Default | An optional comma-separated list of the fully-qualified names of fields that should be excluded from change event message values. Fully-qualified names for fields are of the form databaseName.collectionName.fieldName.nestedFieldName, where databaseName and collectionName may contain the wildcard (*) which matches any characters. |
topic.prefix |
No Default | Topic prefix that provides a namespace for the particular MongoDB instance or cluster in which Nilus is capturing changes. The prefix should be unique across all other connectors. Only alphanumeric characters, hyphens, dots and underscores must be used in the database server logical name. This is mandatory. This prefix is also appended to the sink table. |
transforms.unwrap.array.encoding |
No Default | It controls how array values are encoded when unwrapped by a Kafka Connect transform. Common options include "none " (default), "array ", "json ", or "string ", which define how array elements are serialized into Kafka messages. |
max-table-nesting |
No Default | Specifies the maximum allowed depth for nested tables or objects (commonly in JSON or relational mapping). It helps prevent excessively deep or complex structures that can impact performance or compatibility. |
Sink Options¶
Nilus supports the following sink options for MongoDB CDC:
Field | Description | Default |
---|---|---|
dest-table |
Target table in the sink. | — |
incremental-strategy |
Write mode (append recommended for CDC). |
append |
Core Concepts¶
Nilus captures row-level changes from MongoDB using the replica set oplog
. Below are the essential concepts for understanding how Nilus integrates with MongoDB.
-
Replica Set
- MongoDB must run as a replica set, even in single-node deployments.
- Nilus connects to the primary replica and tails the
oplog
(local.oplog.rs
). - Standalone MongoDB servers are not supported for CDC.
-
The MongoDB Oplog
- The oplog (
oplog.rs
) is a capped collection in thelocal
database. - It records every
insert
,update
, anddelete
applied to the primary. - Nilus reads this log to generate CDC events.
oplog
entries roll off in FIFO (first-in-first-out) order once the allocated size is exhausted.
- The oplog (
-
Schema-Less Nature of MongoDB
- MongoDB is schema-less, but Nilus dynamically infers schemas.
- The sink table is created from the first document observed.
- Schema evolution is tracked using a Schema Registry with Avro.
-
oplog
Retention & Disk PressureNilus maintains a cursor in the
oplog
. If it lags:- Older
oplog
entries may expire. - Expiration causes event loss and forces a new snapshot.
Disk pressure:
oplog
grows continuously with write load.- High disk usage can cause:
- Write slowdowns
- Replication failures
- Node crashes
Recovery:
- If
oplog
retention is exceeded, Nilus enters a pending state. - Restarting the connector is not enough — a redeploy with a new snapshot is required.
- Older
-
Error 286 (ChangeStreamHistoryLost)
Error 286 means Nilus attempted to resume from an
oplog
entry that no longer exists:\Command failed with error 286 (ChangeStreamHistoryLost): Resume of change stream was not possible, as the resume point may no longer be in the oplog
Why It Happens
- Connector lag exceeds oplog window.
oplog
was resized/shrunk.- Filesystem pressure caused truncation.
- High write spikes shortened the retention window.
Recovery
- Redeploy the CDC service with a new PVC directory.
- OR delete offsets so Nilus re-snapshots.
- OR change connector name to force a fresh snapshot (from Nilus
v0.0.13+
).
Prevention
-
Size the
oplog
for the worst-case lag: -
Monitor with
rs.printReplicationInfo()
. -
Avoid long pauses beyond the
oplog
retention window. -
Use
replSetResizeOplog
(MongoDB 4.4+) orminRetentionHours
(MongoDB 6.0+) for stronger guarantees.
Info
Restarting the service alone does not fix Error 286. Manual intervention is required.
oplog
Polling
Nilus continuously tails the
oplog
:- Use a cursor to track the last processed entry. - Parses each entry and emits structured CDC events. - Keeps streaming aligned with replication order.
-
MongoDB System Databases & Access
Nilus requires specific database access:
-
local
- Source of oplog (
local.oplog.rs
). -
Requires
read
permissions. -admin
-
Used for server metadata, discovery, and auth.
- Requires
read
on commands likereplSetGetStatus
,buildInfo
,listDatabases
.
- Source of oplog (
-
config
: Needed only in sharded clusters.
-
-
Target Databases (Application Data)
- Collections you want to capture.
- Requires
read
permissions. - If snapshotting is enabled, Nilus reads all documents during startup.
-
Recommended Source Options(Sample Configuration):
source: address: dataos://mongodept options: engine: debezium collection.include.list: "shop.products" topic.prefix: "cdc_changelog" snapshot.mode: "when_needed" max.batch.size: 250 max.queue.size: 2000 max.queue.size.in.bytes: "134217728" heartbeat.interval.ms: 6000 offset.flush.interval.ms: 15000 sink: address: dataos://testawslh options: dest-table: mongodb_test incremental-strategy: append
Option Reference:
Property Purpose Suggested Value snapshot.mode
Controls behavior if offsets are missing when_needed
(default),initial
, oralways
offset.flush.interval.ms
Frequency of committing offsets 15000 ms heartbeat.interval.ms
Emit heartbeat events when idle 5000–10000 ms max.batch.size
Max records in a batch 250 max.queue.size
Max records in memory 2000 max.queue.size.in.bytes
Max memory buffer size 128 MB (adjustable) -
Operational Playbook
Phase Checklist Daily Monitor oplog window & connector lag. Alert if lag > 80 % of window. Before Maintenance Estimate pause time. If > oplog window, temporarily resize oplog. After Outage If Error 286 occurs, redeploy with fresh snapshot or clean offsets. After Recovery Validate sizing assumptions, adjust oplog size or Nilus throughput configs.