Skip to content

Conversation

@Julien-Ben
Copy link
Collaborator

@Julien-Ben Julien-Ben commented Nov 4, 2025

Multi-Cluster Replica Set Support in MongoDB CRD

CLOUDP-235689 - unifying single-cluster and multi-cluster replica set configuration into the MongoDB CRD.

Context

Currently, users must choose between two CRDs:

  • MongoDB (type: ReplicaSet) - for single-cluster deployments
  • MongoDBMultiCluster - for multi-cluster deployments

This creates UX confusion, code duplication, and migration barriers. Sharded clusters already solved this with a topology field. This PR applies the same pattern to replica sets.

Summary

Extends the MongoDB CRD to support multi-cluster replica sets via topology: MultiCluster. Core functionality:

  • Deploys one StatefulSet per member cluster with stable naming
  • Uses ClusterMapping (cluster name → index) for consistent resource naming across reconciliations
  • Tracks per-cluster replica counts (LastAppliedMemberSpec)
  • Generates unified Ops Manager automation config across all clusters
  • Replicates agent keys and CA ConfigMaps to member clusters
  • Maintains backward compatibility for existing single-cluster deployments (legacy mode)

Implementation follows the helper pattern from sharded cluster controller. State is persisted in annotations (will migrate to ConfigMap later).


Demo: Multi-Cluster Replica Set in Action

Deploying a replica set across 3 Kubernetes clusters with a single MongoDB resource:

kubectl apply -f - <<EOF
apiVersion: mongodb.com/v1
kind: MongoDB
metadata:
  name: multi-replica-set
  namespace: mongodb-test
spec:
  type: ReplicaSet
  topology: MultiCluster
  version: 7.0.18
  clusterSpecList:
    - clusterName: kind-e2e-cluster-1
      members: 1
    - clusterName: kind-e2e-cluster-2
      members: 1
    - clusterName: kind-e2e-cluster-3
      members: 1
  opsManager:
    configMapRef:
      name: my-project
  credentials: my-credentials
EOF

After reconciliation:

$ kubectl get mongodb multi-replica-set -n mongodb-test
NAME                 PHASE     VERSION   AGE
multi-replica-set    Running   7.0.18    9m

Pods running across all 3 clusters:

$ kubectl --context kind-e2e-cluster-1 get pods -n mongodb-test
NAME                   READY   STATUS    RESTARTS   AGE
multi-replica-set-0-0  2/2     Running   0          4m

$ kubectl --context kind-e2e-cluster-2 get pods -n mongodb-test
NAME                   READY   STATUS    RESTARTS   AGE
multi-replica-set-1-0  2/2     Running   0          6m

$ kubectl --context kind-e2e-cluster-3 get pods -n mongodb-test
NAME                   READY   STATUS    RESTARTS   AGE
multi-replica-set-2-0  2/2     Running   0          8m

Next steps

Bug:

  • Scaling from 1 1 1 to 2 1 2 is currently flaky. The controller sometimes try to scale 2 replicas at once, and the update is rejected.

Code cleanup:

  • There is some room for improvements in how we share code between sharded controller, legacy multi cluster controller and this one. For example when creating and waiting on OM hostnames list.

Missing robustness features:

  • Cross-cluster StatefulSet watches
    • No drift detection when StatefulSets are manually modified in member clusters
  • Member cluster health monitoring
    • No automatic reconciliation when member clusters become unavailable
    • Overall health management needs review (Jira ticket opened)

Incomplete validations:

  • Multi-cluster validation rules not fully adapted
  • Need blockNonEmptyClusterSpecItemRemoval protection

State storage:

  • Currently using annotations (should migrate to ConfigMap like sharded clusters)
  • Requires backwards compatibility planning

Limited test coverage:

  • E2E test only covers: deployment (1,1,1) → scale up by 2
  • Not tested: scale down, cluster addition/removal, complex scaling scenarios
  • Missing unit tests for createMemberClusterListFromClusterSpecList

Important Note on State Storage

There is a discussion within the epic team about whether to migrate to config maps immediately for state persistence. Given the uncertainty and desire to move quickly, state is currently serialized to annotations for this PR.

However, the ultimate goal is to migrate to ConfigMap (like sharded clusters and AppDB do). This will provide:

  • Better scalability for large state
  • Cleaner separation of concerns
  • Consistency across all MongoDB controller types

The structured ReplicaSetDeploymentState makes this migration straightforward - we just need to change the serialization target, not the reconciliation logic.

Related tickets (for Jira to link them)

CLOUDP-353893, CLOUDP-353896, CLOUDP-353897


Testing

One new E2E test e2e_multi_cluster_new_replica_set_scale_up was added to assert we can deploy a replica set and reach Running.
New unit test file: mongodbreplicaset_controller_multi_test.go


Checklist

  • Have you linked a jira ticket and/or is the ticket in the title?
  • Have you checked whether your jira ticket required DOCSP changes?
  • Have you added changelog file?

@github-actions
Copy link

github-actions bot commented Nov 4, 2025

⚠️ (this preview might not be accurate if the PR is not rebased on current master branch)

MCK 1.6.0 Release Notes

New Features

  • MongoDBCommunity: Added support to configure custom cluster domain via newly introduced spec.clusterDomain resource field. If spec.clusterDomain is not set, environment variable CLUSTER_DOMAIN is used as cluster domain. If the environment variable CLUSTER_DOMAIN is also not set, operator falls back to cluster.local as default cluster domain.
  • Helm Chart: Introduced two new helm fields operator.podSecurityContext and operator.securityContext that can be used to configure securityContext for Operator deployment through Helm Chart.
  • MongoDBSearch: Switch to gRPC and mTLS for internal communication
    Since MCK 1.4 the mongod and mongot processess communicated using the MongoDB Wire Protocol and used keyfile authentication. This release switches that to gRPC with mTLS authentication. gRPC will allow for load-balancing search queries against multiple mongot processes in the future, and mTLS decouples the internal cluster authentication mode and credentials among mongod processes from the connection to the mongot process. The Operator will automatically enable gRPC for existing and new workloads, and will enable mTLS authentication if both Database Server and MongoDBSearch resource are configured for TLS.

Bug Fixes

  • Fixed parsing of the customEnvVars Helm value when values contain = characters.
  • ReplicaSet: Blocked disabling TLS and changing member count simultaneously. These operations must now be applied separately to prevent configuration inconsistencies.

Other Changes

  • Simplified MongoDB Search setup: Removed the custom Search Coordinator polyfill (a piece of compatibility code previously needed to add the required permissions), as MongoDB 8.2.0 and later now include the necessary permissions via the built-in searchCoordinator role.
  • kubectl-mongodb plugin: cosign, the signing tool that is used to sign kubectl-mongodb plugin binaries, has been updated to version 3.0.2. With this change, released binaries will be bundled with .bundle files containing both signature and certificate information. For more information on how to verify signatures using new cosign version please refer to -> https://github.com/sigstore/cosign/blob/v3.0.2/doc/cosign_verify-blob.md

Fix dns

Update updateOmDeploymentRs

OM registration helpers

Update CRDs

Todo and throw error

Lint

Add new scaling test + mock OM connection

Add helpers

Add scale up to test

Extract common getReplicaSetProcessIdsFromReplicaSets

read_statefulsets for MongoDB (pytest)

Simplify tests

Unit tests for state

Improve comments, usage of state, replicate agent keys

New constants

Switch test order

Some small e2e

Multi cluster tests

Naive scaler

initialize cluster list
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants