Replica Set Elections
- Adding a new node to the replica set,
initiating a replica set,
- performing replica set maintenance using methods such as
- the secondary members losing connectivity to the primary for more than the configured
timeout(10 seconds by default).
In the following diagram, the primary node was unavailable for longer than the
configured timeout and triggers the automatic failover process. One of the remaining secondaries calls for an election to select a new primary and automatically resume normal operations.
The replica set cannot process write operations until the election completes successfully. The replica set can continue to serve read queries if such queries are configured to run on secondaries.
The median time before a cluster elects a new primary should not typically exceed 12 seconds, assuming default
replica configuration settings. This includes time required to mark the primary as unavailable and call and complete an election. You can tune this time period by modifying the
settings.electionTimeoutMillis replication configuration option. Factors such as network latency may extend the time required for replica set elections to complete, which in turn affects the amount of time your cluster may operate without a primary. These factors are dependent on your particular cluster architecture.
Your application connection logic should include tolerance for automatic failovers and the subsequent elections.
New in version 3.6: MongoDB 3.6+ drivers can detect the loss of the primary and automatically retry certain write operations a single time, providing additional built-in handling of automatic failovers and elections.
New in version 3.2: MongoDB introduces a version 1 of the replication protocol (
protocolVersion: 1) to reduce replica set failover time and accelerate the detection of multiple simultaneous primaries. New replica sets, by default, use
protocolVersion: 1. Previous versions of MongoDB use version 0 of the protocol. See replication election enhancements for details.
Replica set members send heartbeats (pings) to each other every two seconds. If a heartbeat does not return within 10 seconds, the other members mark the delinquent member as inaccessible.
After a replica set has a stable primary, the election algorithm will make a “best-effort” attempt to have the secondary with the highest
priority available call an election. Member priority affects both the timing and the outcome of elections; secondaries with higher priority call elections relatively sooner than secondaries with lower priority, and are also more likely to win. However, a lower priority instance can be elected as primary for brief periods, even if a higher priority secondary is available. Replica set members continue to call elections until the highest priority member available becomes primary.
Members with a priority value of
0 cannot become primary and do not seek election. For details, see Priority 0 Replica Set Members.
With a distributed replica set, the loss of a data center may affect the ability of the remaining members in other data center or data centers to elect a primary.
If possible, distribute the replica set members across data centers to maximize the likelihood that even with a loss of a data center, one of the remaining replica set members can become the new primary.
A network partition may segregate a primary into a partition with a minority of nodes. When the primary detects that it can only see a minority of nodes in the replica set, the primary steps down as primary and becomes a secondary. Independently, a member in the partition that can communicate with a majority of the nodes (including itself) holds an election to become the new primary.
- If the member seeking an election is not a member of the voter’s set.
- If the current primary has more recent operations (i.e. a higher
optime) than the member seeking election, from the perspective of another voting member.
- If the current primary has the same or more recent operations (i.e. a higher or equal
optime) than the member seeking election.
- If a priority 0 member  is the most current member at the time of the election. In this case, another eligible member of the set will catch up to the state of the priority 0 member member and then attempt to become primary.
- If the member seeking an election has a lower priority than another member in the set that is also eligible for election.
|||Hidden and delayed imply priority 0 configuration.|
The replica set member configuration setting
members[n].votes and member
state determine whether a member votes in an election.
All replica set members that have their
members[n].votessetting equal to 1 vote in elections. To exclude a member from voting in an election, change the value of the member’s
Only voting members in the following states are eligible to vote:
Although non-voting members do not vote in elections, these members hold copies of the replica set’s data and can accept read operations from client applications.
Non-voting members must have
priority of 0.
For instance, the following nine-member replica set has seven voting members and two non-voting members.
Do not alter the number of votes to control which members will become primary. Instead, modify the
members[n].priority option. Only alter the number of votes in exceptional cases. For example, to permit more than seven members.
To configure a non-voting member, see Configure Non-Voting Replica Set Member.