这两天遇到好几个MongoDB集群故障,其中一种就是节点长期处于 RECOVERING 状态,并且不能主动追上 primary 节点,需要手动干预。
首先 rs.status()查看实例状态,发现有的节点处于 RECOVERING 状态。

查看此节点 log 发现如下报错:

1
2
3
4
5
6
7
8
2018-07-17T19:04:27.343+0800 I REPL     [ReplicationExecutor] syncing from: 10.204.11.48:9303
2018-07-17T19:04:27.347+0800 W REPL [rsBackgroundSync] we are too stale to use 10.204.11.48:9303 as a sync source
2018-07-17T19:04:27.347+0800 I REPL [ReplicationExecutor] could not find member to sync from
2018-07-17T19:04:27.347+0800 E REPL [rsBackgroundSync] too stale to catch up -- entering maintenance mode
2018-07-17T19:04:27.347+0800 I REPL [rsBackgroundSync] our last optime : (term: -1, timestamp: Jul 16 11:55:17:103bb)
2018-07-17T19:04:27.347+0800 I REPL [rsBackgroundSync] oldest available is (term: -1, timestamp: Jul 17 12:49:36:9ffb)
2018-07-17T19:04:27.347+0800 I REPL [rsBackgroundSync] See http://dochub.mongodb.org/core/resyncingaverystalereplicasetmember
2018-07-17T19:04:27.347+0800 I REPL [ReplicationExecutor] going into maintenance mode with 1856 other maintenance mode tasks in progress

显然节点脱离集群时间过长,已经不能同其他节点同步。这种情况下可以通过两种方式将节点重新加入集群。
第一种方法:Automatically Sync a Member
 这种方式比较简单,先关闭阶段,清空 data 目录,重启节点。然后就会自动重新同步
 具体: a. 关闭节点 db.shutdownServer()
    b. 清空data目录 mv data data_old ;mkdir data
    c.启动节点 mongod -f /etc/mongodb9303.cnf
如下是启动之后开始同步的日志。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
2018-07-17T21:38:01.131+0800 I REPL     [ReplicationExecutor] This node is 10.204.11.50:9303 in the config
2018-07-17T21:38:01.131+0800 I REPL [ReplicationExecutor] transition to STARTUP2
2018-07-17T21:38:01.131+0800 I ASIO [NetworkInterfaceASIO-Replication-0] Connecting to 10.204.11.48:9303
2018-07-17T21:38:01.132+0800 I REPL [ReplicationExecutor] Member 10.204.11.70:9303 is now in state ARBITER
2018-07-17T21:38:01.134+0800 I ASIO [NetworkInterfaceASIO-Replication-0] Successfully connected to 10.204.11.48:9303, took 3ms (1 connections now open to 10.204.11.48:9303)
2018-07-17T21:38:01.134+0800 I REPL [ReplicationExecutor] Member 10.204.11.48:9303 is now in state PRIMARY
2018-07-17T21:38:01.572+0800 I NETWORK [initandlisten] connection accepted from 10.204.11.48:59332 #2 (2 connections now open)
2018-07-17T21:38:01.587+0800 I ACCESS [conn2] Successfully authenticated as principal __system on local
2018-07-17T21:38:02.131+0800 I REPL [rsSync] ******
2018-07-17T21:38:02.131+0800 I REPL [rsSync] creating replication oplog of size: 20480MB...
2018-07-17T21:38:02.133+0800 I STORAGE [rsSync] Starting WiredTigerRecordStoreThread local.oplog.rs
2018-07-17T21:38:02.133+0800 I STORAGE [rsSync] The size storer reports that the oplog contains 0 records totaling to 0 bytes
2018-07-17T21:38:02.133+0800 I STORAGE [rsSync] Scanning the oplog to determine where to place markers for truncation
2018-07-17T21:38:02.137+0800 I REPL [rsSync] ******
2018-07-17T21:38:02.137+0800 I REPL [rsSync] initial sync pending
2018-07-17T21:38:02.140+0800 I REPL [rsSync] no valid sync sources found in current replset to do an initial sync
2018-07-17T21:38:02.968+0800 I NETWORK [initandlisten] connection accepted from 10.204.11.50:54890 #3 (3 connections now open)
2018-07-17T21:38:03.140+0800 I REPL [rsSync] initial sync pending
2018-07-17T21:38:03.140+0800 I REPL [ReplicationExecutor] syncing from: 10.204.11.48:9303
2018-07-17T21:38:03.144+0800 I REPL [rsSync] initial sync drop all databases
2018-07-17T21:38:03.144+0800 I STORAGE [rsSync] dropAllDatabasesExceptLocal 1
2018-07-17T21:38:03.144+0800 I REPL [rsSync] initial sync clone all databases
2018-07-17T21:38:03.304+0800 I REPL [rsSync] fetching and creating collections for admin
2018-07-17T21:38:03.306+0800 I REPL [rsSync] fetching and creating collections for dmp_edata_leju_com
2018-07-17T21:38:05.698+0800 I REPL [rsSync] fetching and creating collections for test
2018-07-17T21:38:05.699+0800 I REPL [rsSync] initial sync cloning db: admin
2018-07-17T21:38:05.707+0800 I INDEX [rsSync] build index on: admin.system.users properties: { v: 1, key: { _id: 1 }, name: "_id_", ns: "admin.system.users" }

rs.status() 会发现 状态变成了 STARTUP2 ,并且 data 目录在不断增加。

1
2
3
4
5
6
7
8
9
10
11
12
13
{
"_id" : 3,
"name" : "10.204.11.50:9303",
"health" : 1,
"state" : 5,
"stateStr" : "STARTUP2",
"uptime" : 1586,
"optime" : Timestamp(0, 0),
"optimeDate" : ISODate("1970-01-01T00:00:00Z"),
"syncingTo" : "10.204.11.48:9303",
"configVersion" : 88886,
"self" : true
},

也可以通过 db.printSlaveReplicationInfo( ) 看出同步进度。

1
2
3
4
PRIMARY> db.printSlaveReplicationInfo( )
source: 10.204.11.50:9303
syncedTo: Thu Jan 01 1970 08:00:00 GMT+0800 (CST)
1531815275 secs (425504.24 hrs) behind the primary

第二种方法:Sync by Copying Data Files from Another Member
此方法是通过拷贝其他节点数据的方法实现,前提是集群中能找到多余的 secondary 节点,将那个 secondary 节点停机,然后把它的 data 目录 传输到待修复的节点,启动待修复节点,完成同步。相较于第一种方法,这个方法速度更快。
关于重新同步集群成员的问题,如下是官方的详尽解释。
https://docs.mongodb.com/manual/tutorial/resync-replica-set-member/?spm=a2c4e.11153940.blogcont426357.5.6c78424fIPyht1