背景

当异常掉电后,实例启动会报类似如下错误,属于文件级别的损坏,这时候常规的修复方案是没用的,可以使用 wt 进行数据打捞。

1
2
3
4
5
2019-07-10T21:06:05.725-0500 E STORAGE  [initandlisten] WiredTiger (0) [1454119565:724960][1745:0x7f2ac9534bc0], file:WiredTiger.wt, cursor.next: read checksum error for 4096B block at offset 6
799360: block header checksum of 1769173605 doesnt match expected checksum of 4176084783
2019-07-10T21:06:05.725-0500 E STORAGE  [initandlisten] WiredTiger (0) [1454119565:725067][1745:0x7f2ac9534bc0], file:WiredTiger.wt, cursor.next: WiredTiger.wt: encountered an illegal file format or internal value
2019-07-10T21:06:05.725-0500 E STORAGE  [initandlisten] WiredTiger (-31804) [1454119565:725088][1745:0x7f2ac9534bc0], file:WiredTiger.wt, cursor.next: the process must exit and restart: WT_PANIC: WiredTiger library panic
2019-07-10T21:06:05.725-0500 I -        [initandlisten] Fatal Assertion 28558

软件准备

必要组件安装

1
2
3
4
5
[centos]
yum install  snappy-devel  make gcc gcc-c++ kernel-devel

[ubuntu]
apt-get install libsnappy-dev build-essential 

wt 安装

官网地址 http://source.wiredtiger.com

1
2
3
4
5
wget http://source.wiredtiger.com/releases/wiredtiger-3.2.0.tar.bz2
tar xvf wiredtiger-3.2.0.tar.bz2 
cd wiredtiger-3.2.0
./configure --enable-snappy
make

数据打捞过程

建立恢复目录

切记不要在损坏的原目录进行操作,要在工作路径建立一个恢复路径,将损坏的 dbpath 复制一份到此处。
比如拷贝到 /opt/5113_wechatworkpre_bak/ ,理论上只有必要的 .wt 和损坏的 collection 文件本身就够了,以防万一可以拷贝整个 dbpath .

查找损坏的 collection 的文件名

假设日志中损坏的 collection 叫 dmeo.
执行 db.demo.stats() 定位到 uri ,拿到具体的文件名 wechatworkpre_leju_com/collection-26–7067895897049507085 

打捞

执行如下命令进行打捞作业,打捞出来的文件将覆盖原文件

1
./wt -v -h /opt/5113_wechatworkpre_bak/ -C "extensions=[./ext/compressors/snappy/.libs/libwiredtiger_snappy.so]" -R salvage wechatworkpre_leju_com/collection-26--7067895897049507085.wt (带.wt后缀)

dump

到此为止,不能直接将新生成的文件拷回 datapath ,要执行一次 dump ,生成我们想要的 collection 的原始数据,如下是  demo.dump 

1
2
./wt -v -h /opt/5113_wechatworkpre_bak/ -C "extensions=[./ext/compressors/snappy/.libs/libwiredtiger_snappy.so]" -R dump -f demo.dump wechatworkpre_leju_com/collection-26--7067895897049507085 (不带.wt后缀)

创建新备份实例

这个备份实例的作用是用于导入用 wt dump 出来的数据,为后来的工作做准备

1
2
3
4
5
6
7
mkdir /opt/mongo-recovery/ -p 
mongod --dbpath mongo-recovery --storageEngine wiredTiger --nojournal
mongo --port 27017 
use recovery
db.demo.insert({test: 1})
db.demo.remove({})
db.demo.stats()

load

拿到 collection 对应的数据文件 “uri” : “statistics:table:collection-7–5182884231633924913”,将上面 wt 生成的 dump 文件 load 到新备份实例中(需要关闭备份实例)

1
2
3
./wt -v -h  /opt/mongo-recovery/  -C "extensions=[./ext/compressors/snappy/.libs/libwiredtiger_snappy.so]" -R load -f demo.dump -r collection-7--5182884231633924913


重新启动备份实例

1
2
3
4
> use recovery
switched to db recovery
> db.demo.find()
{ "_id" : ObjectId("5d270709f43c5aedc1aae7c8"), "age" : 1 }

到此为止,打捞出来的数据成功导入备份实例
如果还有问题只需要执行次 mongodump 和 mongorestore –drop  即可。

wt 帮助

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
global options:
        -C      wiredtiger_open configuration
        -h      database directory
        -L      turn logging off for debug-mode
        -R      run recovery if configured
        -V      display library version and exit
        -v      verbose
commands:
        alter     alter an object
        backup    database backup
        compact   compact an object
        copyright copyright information
        create    create an object
        downgrade downgrade a database
        drop      drop an object
        dump      dump an object
        list      list database objects
        load      load an object
        loadtext  load an object from a text file
        printlog  display the database log
        read      read values from an object
        rebalance rebalance an object
        rename    rename an object
        salvage   salvage a file
        stat      display statistics for an object
        truncate  truncate an object, removing all content
        upgrade   upgrade an object
        verify    verify an object
        write     write values to an object