CLOUDP-235689: Unify multi-cluster replica sets feature branch #569

Julien-Ben · 2025-11-04T15:13:26Z

Multi-Cluster Replica Set Support in MongoDB CRD

CLOUDP-235689 - unifying single-cluster and multi-cluster replica set configuration into the MongoDB CRD.

Context

Currently, users must choose between two CRDs:

MongoDB (type: ReplicaSet) - for single-cluster deployments
MongoDBMultiCluster - for multi-cluster deployments

This creates UX confusion, code duplication, and migration barriers. Sharded clusters already solved this with a topology field. This PR applies the same pattern to replica sets.

Summary

Extends the MongoDB CRD to support multi-cluster replica sets via topology: MultiCluster. Core functionality:

Deploys one StatefulSet per member cluster with stable naming
Uses ClusterMapping (cluster name → index) for consistent resource naming across reconciliations
Tracks per-cluster replica counts (LastAppliedMemberSpec)
Generates unified Ops Manager automation config across all clusters
Replicates agent keys and CA ConfigMaps to member clusters
Maintains backward compatibility for existing single-cluster deployments (legacy mode)

Implementation follows the helper pattern from sharded cluster controller. State is persisted in annotations (will migrate to ConfigMap later).

Demo: Multi-Cluster Replica Set in Action

Deploying a replica set across 3 Kubernetes clusters with a single MongoDB resource:

kubectl apply -f - <<EOF
apiVersion: mongodb.com/v1
kind: MongoDB
metadata:
  name: multi-replica-set
  namespace: mongodb-test
spec:
  type: ReplicaSet
  topology: MultiCluster
  version: 7.0.18
  clusterSpecList:
    - clusterName: kind-e2e-cluster-1
      members: 1
    - clusterName: kind-e2e-cluster-2
      members: 1
    - clusterName: kind-e2e-cluster-3
      members: 1
  opsManager:
    configMapRef:
      name: my-project
  credentials: my-credentials
EOF

After reconciliation:

$ kubectl get mongodb multi-replica-set -n mongodb-test
NAME                 PHASE     VERSION   AGE
multi-replica-set    Running   7.0.18    9m

Pods running across all 3 clusters:

$ kubectl --context kind-e2e-cluster-1 get pods -n mongodb-test
NAME                   READY   STATUS    RESTARTS   AGE
multi-replica-set-0-0  2/2     Running   0          4m

$ kubectl --context kind-e2e-cluster-2 get pods -n mongodb-test
NAME                   READY   STATUS    RESTARTS   AGE
multi-replica-set-1-0  2/2     Running   0          6m

$ kubectl --context kind-e2e-cluster-3 get pods -n mongodb-test
NAME                   READY   STATUS    RESTARTS   AGE
multi-replica-set-2-0  2/2     Running   0          8m

Next steps

Bug:

Scaling from 1 1 1 to 2 1 2 is currently flaky. The controller sometimes try to scale 2 replicas at once, and the update is rejected.

Code cleanup:

There is some room for improvements in how we share code between sharded controller, legacy multi cluster controller and this one. For example when creating and waiting on OM hostnames list.

Missing robustness features:

Cross-cluster StatefulSet watches
- No drift detection when StatefulSets are manually modified in member clusters
Member cluster health monitoring
- No automatic reconciliation when member clusters become unavailable
- Overall health management needs review (Jira ticket opened)

Incomplete validations:

Multi-cluster validation rules not fully adapted
Need blockNonEmptyClusterSpecItemRemoval protection

State storage:

Currently using annotations (should migrate to ConfigMap like sharded clusters)
Requires backwards compatibility planning

Limited test coverage:

E2E test only covers: deployment (1,1,1) → scale up by 2
Not tested: scale down, cluster addition/removal, complex scaling scenarios
Missing unit tests for createMemberClusterListFromClusterSpecList

Important Note on State Storage

There is a discussion within the epic team about whether to migrate to config maps immediately for state persistence. Given the uncertainty and desire to move quickly, state is currently serialized to annotations for this PR.

However, the ultimate goal is to migrate to ConfigMap (like sharded clusters and AppDB do). This will provide:

Better scalability for large state

Cleaner separation of concerns

Consistency across all MongoDB controller types

The structured ReplicaSetDeploymentState makes this migration straightforward - we just need to change the serialization target, not the reconciliation logic.

Related tickets (for Jira to link them)

CLOUDP-353893, CLOUDP-353896, CLOUDP-353897

Testing

One new E2E test e2e_multi_cluster_new_replica_set_scale_up was added to assert we can deploy a replica set and reach Running.
New unit test file: mongodbreplicaset_controller_multi_test.go

Checklist

Have you linked a jira ticket and/or is the ticket in the title?
Have you checked whether your jira ticket required DOCSP changes?
Have you added changelog file?
- use skip-changelog label if not needed
- refer to Changelog files and Release Notes section in CONTRIBUTING.md for more details

github-actions · 2025-11-04T15:14:21Z

⚠️ (this preview might not be accurate if the PR is not rebased on current master branch)

MCK 1.6.0 Release Notes

New Features

MongoDBCommunity: Added support to configure custom cluster domain via newly introduced spec.clusterDomain resource field. If spec.clusterDomain is not set, environment variable CLUSTER_DOMAIN is used as cluster domain. If the environment variable CLUSTER_DOMAIN is also not set, operator falls back to cluster.local as default cluster domain.
Helm Chart: Introduced two new helm fields operator.podSecurityContext and operator.securityContext that can be used to configure securityContext for Operator deployment through Helm Chart.
MongoDBSearch: Switch to gRPC and mTLS for internal communication
Since MCK 1.4 the mongod and mongot processess communicated using the MongoDB Wire Protocol and used keyfile authentication. This release switches that to gRPC with mTLS authentication. gRPC will allow for load-balancing search queries against multiple mongot processes in the future, and mTLS decouples the internal cluster authentication mode and credentials among mongod processes from the connection to the mongot process. The Operator will automatically enable gRPC for existing and new workloads, and will enable mTLS authentication if both Database Server and MongoDBSearch resource are configured for TLS.

Bug Fixes

Fixed parsing of the customEnvVars Helm value when values contain = characters.
ReplicaSet: Blocked disabling TLS and changing member count simultaneously. These operations must now be applied separately to prevent configuration inconsistencies.

Other Changes

Simplified MongoDB Search setup: Removed the custom Search Coordinator polyfill (a piece of compatibility code previously needed to add the required permissions), as MongoDB 8.2.0 and later now include the necessary permissions via the built-in searchCoordinator role.
kubectl-mongodb plugin: cosign, the signing tool that is used to sign kubectl-mongodb plugin binaries, has been updated to version 3.0.2. With this change, released binaries will be bundled with .bundle files containing both signature and certificate information. For more information on how to verify signatures using new cosign version please refer to -> https://github.com/sigstore/cosign/blob/v3.0.2/doc/cosign_verify-blob.md

Fix dns Update updateOmDeploymentRs OM registration helpers Update CRDs Todo and throw error Lint Add new scaling test + mock OM connection Add helpers Add scale up to test Extract common getReplicaSetProcessIdsFromReplicaSets read_statefulsets for MongoDB (pytest) Simplify tests Unit tests for state Improve comments, usage of state, replicate agent keys New constants Switch test order Some small e2e Multi cluster tests Naive scaler initialize cluster list

Initial feature branch commit (typo)

cac41bf

Julien-Ben added 9 commits November 4, 2025 16:24

Skeleton for multi cluster configuration

5c79cf0

members map in main

648b815

Test boilerplate

951cd45

Update newReplicaSetReconciler signatures

cc67732

First test correctly failing

b8205d9

Lint

91f232e

Lint

97379d6

Fix unit test

201c4ef

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CLOUDP-235689: Unify multi-cluster replica sets feature branch #569

CLOUDP-235689: Unify multi-cluster replica sets feature branch #569

Uh oh!

Julien-Ben commented Nov 4, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CLOUDP-235689: Unify multi-cluster replica sets feature branch #569

Are you sure you want to change the base?

CLOUDP-235689: Unify multi-cluster replica sets feature branch #569

Uh oh!

Conversation

Julien-Ben commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Multi-Cluster Replica Set Support in MongoDB CRD

Context

Summary

Demo: Multi-Cluster Replica Set in Action

Next steps

Related tickets (for Jira to link them)

Testing

Checklist

Uh oh!

github-actions bot commented Nov 4, 2025

MCK 1.6.0 Release Notes

New Features

Bug Fixes

Other Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Julien-Ben commented Nov 4, 2025 •

edited

Loading