ClickPipes for MongoDB FAQ
Can I query for individual fields in the JSON datatype?
Yes, ClickHouse provides powerful JSON querying capabilities.
- Dot Notation: Direct field access
- Nested Objects: Use the
^
operator for sub-objects
How do I flatten the nested MongoDB documents in ClickHouse?
MongoDB documents are replicated as JSON type in ClickHouse by default, preserving the nested structure. You have several options to flatten this data. If you want to flatten the data to columns, you can use normal views, materialized views, or query-time access.
- Normal Views: Use normal views to encapsulate flattening logic.
- Materialized Views: For smaller datasets, you can use refreshable materialized with FINAL to periodically flatten and deduplicate data. For larger datasets, we recommend using incremental materialized views without FINAL to flatten the data in real-time, and then deduplicate data at query time.
- Query-time Access: Instead of flattening, use dot notation to access nested fields directly in queries.
For detailed examples, see our Working with JSON guide.
Can I connect MongoDB databases that don't have a public IP or are in private networks?
We support AWS PrivateLink for connecting to MongoDB databases that don't have a public IP or are in private networks.
What happens if I delete a database/table from my MongoDB database?
When you delete a database/table from MongoDB, ClickPipes will continue running but the dropped database/table will stop replicating changes. The corresponding tables in ClickHouse is preserved.
How does MongoDB CDC Connector handle transactions?
Each document change within a transaction is processed individually to ClickHouse. Changes are applied in the order they appear in the oplog; and only committed changes are replicated to ClickHouse. If a MongoDB transaction is rolled back, those changes won't appear in the change stream.
For more examples, see our Working with JSON guide.
How do I handle resume of change stream was not possible, as the resume point may no longer be in the oplog.
error?
This error typically occurs when the oplog is truncated and ClickPipe is unable to resume the change stream at the expected point. To resolve this issue, resync the ClickPipe. To avoid this issue from recurring, we recommend increasing the oplog retention period (or here if you are on a self-managed MongoDB).
How is replication managed?
We use MongoDB's native Change Streams API to track changes in the database. Change Streams API provides a resumable stream of database changes by leveraging MongoDB's oplog (operations log). ClickPipe uses MongoDB's resume tokens to track the position in the oplog and ensure every change is replicated to ClickHouse.
Which read preference should I use?
Which read preference to use depends on your specific use case. If you want to minimize the load on your primary node, we recommend using secondaryPreferred
read preference. If you want to optimize ingestion latency, we recommend using primaryPreferred
read preference. For more details, see MongoDB documentation.
Does the MongoDB ClickPipe support Sharded Cluster?
Yes, the MongoDB ClickPipe supports both Replica Set and Sharded Cluster.