当 linux 系统内核发生崩溃的时候,可以通过 kdump 等方式收集内核崩溃之前的内存,生成一个转储文件 vmcore , crash 是一个广泛使用的内核崩溃转储文件分析工具.

Kdump 是一种基于 kexec 的内存转储工具,目前它已经被内核主线接收,成为了内核的一部分,它也由此获得了绝大多数 Linux 发行版的支持。与传统的内存转储机制不同不同,基于 Kdump 的系统工作的时候需要两个内核,一个称为系统内核,即系统正常工作时运行的内核;另外一个称为捕获内核,即正常内核崩溃时,用来进行内存转储的内核。

#### 工具准备

uname -r 
2.6.32-696.el6.x86_64
拿到内核版本号,去下面链接下载对应的包
http://debuginfo.centos.org/6/x86_64/

1
2
3
4
5
kernel-debuginfo-2.6.32-696.el6.x86_64.rpm
kernel-debuginfo-common-x86_64-2.6.32-696.el6.x86_64.rpm

rpm -ivh kernel-debuginfo-2.6.32-696.el6.x86_64.rpm;
rpm -ivh kernel-debuginfo-common-x86_64-2.6.32-696.el6.x86_64.rpm;

#### crash 内置命令简介

    bt - backtrace
        bt 命令用于查看系统崩溃前的堆栈等信息,这是系统调试中非常常用和好用的一个命令。
    log - dump system message buffer
        log 命令可以打印系统消息缓冲区,从而可能找到系统崩溃的线索
    ps - display process status information
        ps 命令用于显示进程的状态,(如图)带 > 标识代表是活跃的进程。
    dis - disassembling instruction
        dis 命令用于对给定地址的内容进行反汇编。         

#### 分析

用crash命令打开vmcore

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# crash  /usr/lib/debug/lib/modules/2.6.32-431.el6.x86_64/vmlinux /var/crash/127.0.0.1-2019-07-25-23\:49\:55/vmcore
crash 7.1.0-8.el6
Copyright (C) 2002-2014  Red Hat, Inc.
Copyright (C) 2004, 2005, 2006, 2010  IBM Corporation
Copyright (C) 1999-2006  Hewlett-Packard Co
Copyright (C) 2005, 2006, 2011, 2012  Fujitsu Limited
Copyright (C) 2006, 2007  VA Linux Systems Japan K.K.
Copyright (C) 2005, 2011  NEC Corporation
Copyright (C) 1999, 2002, 2007  Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002  Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions.  Enter "help copying" to see the conditions.
This program has absolutely no warranty.  Enter "help warranty" for details.
 
GNU gdb (GDB) 7.6
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...

      KERNEL: /usr/lib/debug/lib/modules/2.6.32-431.el6.x86_64/vmlinux
    DUMPFILE: /var/crash/127.0.0.1-2019-07-25-23:49:55/vmcore  [PARTIAL DUMP]
        CPUS: 12
        DATE: Fri Jul 26 00:08:08 2019
      UPTIME: 1 days, 23:57:36
LOAD AVERAGE: 30.42, 13.07, 4.99
       TASKS: 392
    NODENAME: d16050205.grid.*.*.com.cn
     RELEASE: 2.6.32-431.el6.x86_64
     VERSION: #1 SMP Fri Nov 22 03:15:09 UTC 2013
     MACHINE: x86_64  (1900 Mhz)
      MEMORY: 47.9 GB
       PANIC: "BUG: unable to handle kernel paging request at fffffffffffffff1"
         PID: 2194
     COMMAND: "zabbix_agentd"
        TASK: ffff880c6c6f7500  [THREAD_INFO: ffff880c6aa1e000]
         CPU: 3
       STATE: TASK_RUNNING (PANIC)

KERNEL: 系统崩溃时运行的 kernel 文件
DUMPFILE: 内核转储文件
CPUS: 所在机器的 CPU 数量
DATE: 系统崩溃的时间
TASKS: 系统崩溃时内存中的任务数
NODENAME: 崩溃的系统主机名
RELEASE: 和 VERSION: 内核版本号
MACHINE: CPU 架构
MEMORY: 崩溃主机的物理内存
PANIC: 崩溃类型,常见的崩溃类型包括:
    SysRq (System Request):通过魔法组合键导致的系统崩溃,通常是测试使用。通过 echo c > /proc/sysrq-trigger,就可以触发系统崩溃。
    oops:可以看成是内核级的 Segmentation Fault。应用程序如果进行了非法内存访问或执行了非法指令,会得到 Segfault 信号,一般行为是 coredump,应用程序也可以自己截获 Segfault 信号,自行处理。如果内核自己犯了这样的错误,则会弹出 oops 信息。

到这里已经可以判断是zabbix的问题了。下面例行讲crash常用的命令说下。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
crash> bt 
PID: 2194   TASK: ffff880c6c6f7500  CPU: 3   COMMAND: "zabbix_agentd"
 #0 [ffff880c6aa1ecf0] machine_kexec at ffffffff81038f3b
 #1 [ffff880c6aa1ed50] crash_kexec at ffffffff810c5d92
 #2 [ffff880c6aa1ee20] oops_end at ffffffff8152b510
 #3 [ffff880c6aa1ee50] no_context at ffffffff8104a00b
 #4 [ffff880c6aa1eea0] __bad_area_nosemaphore at ffffffff8104a295
 #5 [ffff880c6aa1eef0] bad_area_nosemaphore at ffffffff8104a363
 #6 [ffff880c6aa1ef00] __do_page_fault at ffffffff8104aabf
 #7 [ffff880c6aa1f020] do_page_fault at ffffffff8152d45e
 #8 [ffff880c6aa1f050] page_fault at ffffffff8152a815
    [exception RIP: xfs_trans_buf_item_match+48]
    RIP: ffffffffa01e4660  RSP: ffff880c6aa1f108  RFLAGS: 00010292
    RAX: 0000000000000001  RBX: ffff880c6a623ea0  RCX: 0000000000001000
    RDX: 0000000f00014cb8  RSI: ffff880c7240b180  RDI: ffff880c6a623f80
    RBP: ffff880c6aa1f108   R8: fffffffffffffff1   R9: 0000000000000000
    R10: ffff880c7240b180  R11: 0000000000000002  R12: 0000000000000008
    R13: ffff880c7625e000  R14: 0000000000004004  R15: 0000000f00014cb8
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #9 [ffff880c6aa1f110] xfs_trans_read_buf at ffffffffa01e4a8d [xfs]
#10 [ffff880c6aa1f160] xfs_btree_read_buf_block at ffffffffa01b19ce [xfs]
#11 [ffff880c6aa1f1c0] xfs_btree_lshift at ffffffffa01b1aee [xfs]
#12 [ffff880c6aa1f260] xfs_btree_delrec at ffffffffa01b3d88 [xfs]
#13 [ffff880c6aa1f370] xfs_btree_delete at ffffffffa01b4188 [xfs]
#14 [ffff880c6aa1f3b0] xfs_free_ag_extent at ffffffffa019a78b [xfs]
#15 [ffff880c6aa1f450] xfs_free_extent at ffffffffa019c7b1 [xfs]
#16 [ffff880c6aa1f500] xfs_bmap_finish at ffffffffa01a64ad [xfs]
#17 [ffff880c6aa1f550] xfs_itruncate_finish at ffffffffa01cc4ef [xfs]
#18 [ffff880c6aa1f600] xfs_inactive at ffffffffa01e85b4 [xfs]
#19 [ffff880c6aa1f650] xfs_fs_clear_inode at ffffffffa01f5bc0 [xfs]
#20 [ffff880c6aa1f670] clear_inode at ffffffff811a5bdc
#21 [ffff880c6aa1f690] generic_delete_inode at ffffffff811a6396
#22 [ffff880c6aa1f6c0] generic_drop_inode at ffffffff811a6435
#23 [ffff880c6aa1f6e0] iput at ffffffff811a5282
#24 [ffff880c6aa1f700] dentry_iput at ffffffff811a1e40
#25 [ffff880c6aa1f720] d_kill at ffffffff811a1fa1
#26 [ffff880c6aa1f740] __shrink_dcache_sb at ffffffff811a2336
#27 [ffff880c6aa1f7e0] shrink_dcache_memory at ffffffff811a24b9
#28 [ffff880c6aa1f840] shrink_slab at ffffffff81138ada
#29 [ffff880c6aa1f8a0] zone_reclaim at ffffffff8113b6de
#30 [ffff880c6aa1f9c0] get_page_from_freelist at ffffffff8112d83c
#31 [ffff880c6aa1faf0] __alloc_pages_nodemask at ffffffff8112f3a3
#32 [ffff880c6aa1fc30] kmem_getpages at ffffffff8116e482
#33 [ffff880c6aa1fc60] cache_grow at ffffffff8116eaef
#34 [ffff880c6aa1fcd0] cache_alloc_refill at ffffffff8116ed42
#35 [ffff880c6aa1fd40] kmem_cache_alloc at ffffffff8116fddf
#36 [ffff880c6aa1fd80] getname at ffffffff81197007
#37 [ffff880c6aa1fdc0] user_path_at at ffffffff8119b181
#38 [ffff880c6aa1fe90] user_statfs at ffffffff811bd0f8
#39 [ffff880c6aa1fee0] sys_statfs at ffffffff811bd20a
#40 [ffff880c6aa1ff80] system_call_fastpath at ffffffff8100b072
    RIP: 0000003123edb257  RSP: 00007fff250c4c38  RFLAGS: 00010246
    RAX: 0000000000000089  RBX: ffffffff8100b072  RCX: 0000000000000000
    RDX: 00000000011d59a0  RSI: 00007fff250c5360  RDI: 00000000011db0c0
    RBP: 00007fff250c5360   R8: 000000312418fee8   R9: 0000000000000001
    R10: 000000000000000d  R11: 0000000000000202  R12: 00007fff250c5410
    R13: 00000000011db0c0  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: 0000000000000089  CS: 0033  SS: 002b

以”# 数字”开头的行为调用堆栈,即系统崩溃前内核依次调用的一系列函数,通过这个可以迅速推断内核在何处崩溃。
定位到kernel崩溃前的一个exception是ip寄存器RIP的异常,而通过dis 命令来看一下该地址的反汇编结果:
关注这两行:

1
2
 [exception RIP: xfs_trans_buf_item_match+48]
    RIP: ffffffffa01e4660  RSP: ffff880c6aa1f108  RFLAGS: 00010292

通过   dis -l ffffffffa01e4660 
或者 dis xfs_trans_buf_item_match+48 都可以反编译。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
crash> dis -l ffffffffa01e4660        
0xffffffffa01e4660 <xfs_trans_buf_item_match+48>:       mov    (%r8),%rax
crash> dis -l xfs_trans_buf_item_match+48
0xffffffffa01e4660 <xfs_trans_buf_item_match+48>:       mov    (%r8),%rax
crash> dis -l xfs_trans_buf_item_match
0xffffffffa01e4630 <xfs_trans_buf_item_match>:  push   %rbp
0xffffffffa01e4631 <xfs_trans_buf_item_match+1>:        mov    %rsp,%rbp
0xffffffffa01e4634 <xfs_trans_buf_item_match+4>:        nopl   0x0(%rax,%rax,1)
0xffffffffa01e4639 <xfs_trans_buf_item_match+9>:        mov    0xe0(%rdi),%rax
0xffffffffa01e4640 <xfs_trans_buf_item_match+16>:       add    $0xe0,%rdi
0xffffffffa01e4647 <xfs_trans_buf_item_match+23>:       shl    $0x9,%ecx
0xffffffffa01e464a <xfs_trans_buf_item_match+26>:       cmp    %rax,%rdi
0xffffffffa01e464d <xfs_trans_buf_item_match+29>:       lea    -0x10(%rax),%r8
0xffffffffa01e4651 <xfs_trans_buf_item_match+33>:       je     0xffffffffa01e4698
0xffffffffa01e4653 <xfs_trans_buf_item_match+35>:       movslq %ecx,%rcx
0xffffffffa01e4656 <xfs_trans_buf_item_match+38>:       nopw   %cs:0x0(%rax,%rax,1)
0xffffffffa01e4660 <xfs_trans_buf_item_match+48>:       mov    (%r8),%rax
0xffffffffa01e4663 <xfs_trans_buf_item_match+51>:       cmpl   $0x123c,0x30(%rax)
0xffffffffa01e466a <xfs_trans_buf_item_match+58>:       jne    0xffffffffa01e468b
0xffffffffa01e466c <xfs_trans_buf_item_match+60>:       mov    0x70(%rax),%rax
0xffffffffa01e4670 <xfs_trans_buf_item_match+64>:       cmp    %rsi,0x98(%rax)
0xffffffffa01e4677 <xfs_trans_buf_item_match+71>:       jne    0xffffffffa01e468b
0xffffffffa01e4679 <xfs_trans_buf_item_match+73>:       cmp    %rdx,0xa0(%rax)
0xffffffffa01e4680 <xfs_trans_buf_item_match+80>:       jne    0xffffffffa01e468b
0xffffffffa01e4682 <xfs_trans_buf_item_match+82>:       cmp    %rcx,0xa8(%rax)
0xffffffffa01e4689 <xfs_trans_buf_item_match+89>:       je     0xffffffffa01e469a
0xffffffffa01e468b <xfs_trans_buf_item_match+91>:       mov    0x10(%r8),%rax
0xffffffffa01e468f <xfs_trans_buf_item_match+95>:       cmp    %rax,%rdi
0xffffffffa01e4692 <xfs_trans_buf_item_match+98>:       lea    -0x10(%rax),%r8
0xffffffffa01e4696 <xfs_trans_buf_item_match+102>:      jne    0xffffffffa01e4660
0xffffffffa01e4698 <xfs_trans_buf_item_match+104>:      xor    %eax,%eax
0xffffffffa01e469a <xfs_trans_buf_item_match+106>:      leaveq 
0xffffffffa01e469b <xfs_trans_buf_item_match+107>:      retq   
0xffffffffa01e469c <xfs_trans_buf_item_match+108>:      nopl   0x0(%rax)

通过对内存转储文件的分析,可以做到精确定位问题的目的,对于应用层日志没有效信息的情况比较有助于定位问题。