当 linux 系统内核发生崩溃的时候,可以通过 kdump 等方式收集内核崩溃之前的内存,生成一个转储文件 vmcore , crash 是一个广泛使用的内核崩溃转储文件分析工具.
Kdump 是一种基于 kexec 的内存转储工具,目前它已经被内核主线接收,成为了内核的一部分,它也由此获得了绝大多数 Linux 发行版的支持。与传统的内存转储机制不同不同,基于 Kdump 的系统工作的时候需要两个内核,一个称为系统内核,即系统正常工作时运行的内核;另外一个称为捕获内核,即正常内核崩溃时,用来进行内存转储的内核。
#### 工具准备
uname -r 2.6.32-696.el6.x86_64 拿到内核版本号,去下面链接下载对应的包http://debuginfo.centos.org/6/x86_64/
1 2 3 4 5 kernel-debuginfo-2.6.32-696.el6.x86_64.rpm kernel-debuginfo-common-x86_64-2.6.32-696.el6.x86_64.rpm rpm -ivh kernel-debuginfo-2.6.32-696.el6.x86_64.rpm; rpm -ivh kernel-debuginfo-common-x86_64-2.6.32-696.el6.x86_64.rpm;
#### crash 内置命令简介 bt - backtrace bt 命令用于查看系统崩溃前的堆栈等信息,这是系统调试中非常常用和好用的一个命令。 log - dump system message buffer log 命令可以打印系统消息缓冲区,从而可能找到系统崩溃的线索 ps - display process status information ps 命令用于显示进程的状态,(如图)带 > 标识代表是活跃的进程。 dis - disassembling instruction dis 命令用于对给定地址的内容进行反汇编。
#### 分析 用crash命令打开vmcore
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 # crash /usr/lib/debug/lib/modules/2.6.32-431.el6.x86_64/vmlinux /var/crash/127.0.0.1-2019-07-25-23\:49\:55/vmcore crash 7.1.0-8.el6 Copyright (C) 2002-2014 Red Hat, Inc. Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation Copyright (C) 1999-2006 Hewlett-Packard Co Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited Copyright (C) 2006, 2007 VA Linux Systems Japan K.K. Copyright (C) 2005, 2011 NEC Corporation Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc. Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc. This program is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Enter "help copying" to see the conditions. This program has absolutely no warranty. Enter "help warranty" for details. GNU gdb (GDB) 7.6 Copyright (C) 2013 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-unknown-linux-gnu"... KERNEL: /usr/lib/debug/lib/modules/2.6.32-431.el6.x86_64/vmlinux DUMPFILE: /var/crash/127.0.0.1-2019-07-25-23:49:55/vmcore [PARTIAL DUMP] CPUS: 12 DATE: Fri Jul 26 00:08:08 2019 UPTIME: 1 days, 23:57:36 LOAD AVERAGE: 30.42, 13.07, 4.99 TASKS: 392 NODENAME: d16050205.grid.*.*.com.cn RELEASE: 2.6.32-431.el6.x86_64 VERSION: #1 SMP Fri Nov 22 03:15:09 UTC 2013 MACHINE: x86_64 (1900 Mhz) MEMORY: 47.9 GB PANIC: "BUG: unable to handle kernel paging request at fffffffffffffff1" PID: 2194 COMMAND: "zabbix_agentd" TASK: ffff880c6c6f7500 [THREAD_INFO: ffff880c6aa1e000] CPU: 3 STATE: TASK_RUNNING (PANIC)
KERNEL: 系统崩溃时运行的 kernel 文件 DUMPFILE: 内核转储文件 CPUS: 所在机器的 CPU 数量 DATE: 系统崩溃的时间 TASKS: 系统崩溃时内存中的任务数 NODENAME: 崩溃的系统主机名 RELEASE: 和 VERSION: 内核版本号 MACHINE: CPU 架构 MEMORY: 崩溃主机的物理内存 PANIC: 崩溃类型,常见的崩溃类型包括: SysRq (System Request):通过魔法组合键导致的系统崩溃,通常是测试使用。通过 echo c > /proc/sysrq-trigger,就可以触发系统崩溃。 oops:可以看成是内核级的 Segmentation Fault。应用程序如果进行了非法内存访问或执行了非法指令,会得到 Segfault 信号,一般行为是 coredump,应用程序也可以自己截获 Segfault 信号,自行处理。如果内核自己犯了这样的错误,则会弹出 oops 信息。
到这里已经可以判断是zabbix的问题了。下面例行讲crash常用的命令说下。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 crash> bt PID: 2194 TASK: ffff880c6c6f7500 CPU: 3 COMMAND: "zabbix_agentd" # 0 [ffff880c6aa1ecf0] machine_kexec at ffffffff81038f3b # 1 [ffff880c6aa1ed50] crash_kexec at ffffffff810c5d92 # 2 [ffff880c6aa1ee20] oops_end at ffffffff8152b510 # 3 [ffff880c6aa1ee50] no_context at ffffffff8104a00b # 4 [ffff880c6aa1eea0] __bad_area_nosemaphore at ffffffff8104a295 # 5 [ffff880c6aa1eef0] bad_area_nosemaphore at ffffffff8104a363 # 6 [ffff880c6aa1ef00] __do_page_fault at ffffffff8104aabf # 7 [ffff880c6aa1f020] do_page_fault at ffffffff8152d45e # 8 [ffff880c6aa1f050] page_fault at ffffffff8152a815 [exception RIP: xfs_trans_buf_item_match+48] RIP: ffffffffa01e4660 RSP: ffff880c6aa1f108 RFLAGS: 00010292 RAX: 0000000000000001 RBX: ffff880c6a623ea0 RCX: 0000000000001000 RDX: 0000000f00014cb8 RSI: ffff880c7240b180 RDI: ffff880c6a623f80 RBP: ffff880c6aa1f108 R8: fffffffffffffff1 R9: 0000000000000000 R10: ffff880c7240b180 R11: 0000000000000002 R12: 0000000000000008 R13: ffff880c7625e000 R14: 0000000000004004 R15: 0000000f00014cb8 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 # 9 [ffff880c6aa1f110] xfs_trans_read_buf at ffffffffa01e4a8d [xfs] # 10 [ffff880c6aa1f160] xfs_btree_read_buf_block at ffffffffa01b19ce [xfs] # 11 [ffff880c6aa1f1c0] xfs_btree_lshift at ffffffffa01b1aee [xfs] # 12 [ffff880c6aa1f260] xfs_btree_delrec at ffffffffa01b3d88 [xfs] # 13 [ffff880c6aa1f370] xfs_btree_delete at ffffffffa01b4188 [xfs] # 14 [ffff880c6aa1f3b0] xfs_free_ag_extent at ffffffffa019a78b [xfs] # 15 [ffff880c6aa1f450] xfs_free_extent at ffffffffa019c7b1 [xfs] # 16 [ffff880c6aa1f500] xfs_bmap_finish at ffffffffa01a64ad [xfs] # 17 [ffff880c6aa1f550] xfs_itruncate_finish at ffffffffa01cc4ef [xfs] # 18 [ffff880c6aa1f600] xfs_inactive at ffffffffa01e85b4 [xfs] # 19 [ffff880c6aa1f650] xfs_fs_clear_inode at ffffffffa01f5bc0 [xfs] # 20 [ffff880c6aa1f670] clear_inode at ffffffff811a5bdc # 21 [ffff880c6aa1f690] generic_delete_inode at ffffffff811a6396 # 22 [ffff880c6aa1f6c0] generic_drop_inode at ffffffff811a6435 # 23 [ffff880c6aa1f6e0] iput at ffffffff811a5282 # 24 [ffff880c6aa1f700] dentry_iput at ffffffff811a1e40 # 25 [ffff880c6aa1f720] d_kill at ffffffff811a1fa1 # 26 [ffff880c6aa1f740] __shrink_dcache_sb at ffffffff811a2336 # 27 [ffff880c6aa1f7e0] shrink_dcache_memory at ffffffff811a24b9 # 28 [ffff880c6aa1f840] shrink_slab at ffffffff81138ada # 29 [ffff880c6aa1f8a0] zone_reclaim at ffffffff8113b6de # 30 [ffff880c6aa1f9c0] get_page_from_freelist at ffffffff8112d83c # 31 [ffff880c6aa1faf0] __alloc_pages_nodemask at ffffffff8112f3a3 # 32 [ffff880c6aa1fc30] kmem_getpages at ffffffff8116e482 # 33 [ffff880c6aa1fc60] cache_grow at ffffffff8116eaef # 34 [ffff880c6aa1fcd0] cache_alloc_refill at ffffffff8116ed42 # 35 [ffff880c6aa1fd40] kmem_cache_alloc at ffffffff8116fddf # 36 [ffff880c6aa1fd80] getname at ffffffff81197007 # 37 [ffff880c6aa1fdc0] user_path_at at ffffffff8119b181 # 38 [ffff880c6aa1fe90] user_statfs at ffffffff811bd0f8 # 39 [ffff880c6aa1fee0] sys_statfs at ffffffff811bd20a # 40 [ffff880c6aa1ff80] system_call_fastpath at ffffffff8100b072 RIP: 0000003123edb257 RSP: 00007fff250c4c38 RFLAGS: 00010246 RAX: 0000000000000089 RBX: ffffffff8100b072 RCX: 0000000000000000 RDX: 00000000011d59a0 RSI: 00007fff250c5360 RDI: 00000000011db0c0 RBP: 00007fff250c5360 R8: 000000312418fee8 R9: 0000000000000001 R10: 000000000000000d R11: 0000000000000202 R12: 00007fff250c5410 R13: 00000000011db0c0 R14: 0000000000000000 R15: 0000000000000000 ORIG_RAX: 0000000000000089 CS: 0033 SS: 002b
以”# 数字”开头的行为调用堆栈,即系统崩溃前内核依次调用的一系列函数,通过这个可以迅速推断内核在何处崩溃。 定位到kernel崩溃前的一个exception是ip寄存器RIP的异常,而通过dis 命令来看一下该地址的反汇编结果: 关注这两行:
1 2 [exception RIP: xfs_trans_buf_item_match+48] RIP: ffffffffa01e4660 RSP: ffff880c6aa1f108 RFLAGS: 00010292
通过 dis -l ffffffffa01e4660 或者 dis xfs_trans_buf_item_match+48 都可以反编译。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 crash> dis -l ffffffffa01e4660 0xffffffffa01e4660 <xfs_trans_buf_item_match+48>: mov (%r8),%rax crash> dis -l xfs_trans_buf_item_match+48 0xffffffffa01e4660 <xfs_trans_buf_item_match+48>: mov (%r8),%rax crash> dis -l xfs_trans_buf_item_match 0xffffffffa01e4630 <xfs_trans_buf_item_match>: push %rbp 0xffffffffa01e4631 <xfs_trans_buf_item_match+1>: mov %rsp,%rbp 0xffffffffa01e4634 <xfs_trans_buf_item_match+4>: nopl 0x0(%rax,%rax,1) 0xffffffffa01e4639 <xfs_trans_buf_item_match+9>: mov 0xe0(%rdi),%rax 0xffffffffa01e4640 <xfs_trans_buf_item_match+16>: add $0xe0,%rdi 0xffffffffa01e4647 <xfs_trans_buf_item_match+23>: shl $0x9,%ecx 0xffffffffa01e464a <xfs_trans_buf_item_match+26>: cmp %rax,%rdi 0xffffffffa01e464d <xfs_trans_buf_item_match+29>: lea -0x10(%rax),%r8 0xffffffffa01e4651 <xfs_trans_buf_item_match+33>: je 0xffffffffa01e4698 0xffffffffa01e4653 <xfs_trans_buf_item_match+35>: movslq %ecx,%rcx 0xffffffffa01e4656 <xfs_trans_buf_item_match+38>: nopw %cs:0x0(%rax,%rax,1) 0xffffffffa01e4660 <xfs_trans_buf_item_match+48>: mov (%r8),%rax 0xffffffffa01e4663 <xfs_trans_buf_item_match+51>: cmpl $0x123c,0x30(%rax) 0xffffffffa01e466a <xfs_trans_buf_item_match+58>: jne 0xffffffffa01e468b 0xffffffffa01e466c <xfs_trans_buf_item_match+60>: mov 0x70(%rax),%rax 0xffffffffa01e4670 <xfs_trans_buf_item_match+64>: cmp %rsi,0x98(%rax) 0xffffffffa01e4677 <xfs_trans_buf_item_match+71>: jne 0xffffffffa01e468b 0xffffffffa01e4679 <xfs_trans_buf_item_match+73>: cmp %rdx,0xa0(%rax) 0xffffffffa01e4680 <xfs_trans_buf_item_match+80>: jne 0xffffffffa01e468b 0xffffffffa01e4682 <xfs_trans_buf_item_match+82>: cmp %rcx,0xa8(%rax) 0xffffffffa01e4689 <xfs_trans_buf_item_match+89>: je 0xffffffffa01e469a 0xffffffffa01e468b <xfs_trans_buf_item_match+91>: mov 0x10(%r8),%rax 0xffffffffa01e468f <xfs_trans_buf_item_match+95>: cmp %rax,%rdi 0xffffffffa01e4692 <xfs_trans_buf_item_match+98>: lea -0x10(%rax),%r8 0xffffffffa01e4696 <xfs_trans_buf_item_match+102>: jne 0xffffffffa01e4660 0xffffffffa01e4698 <xfs_trans_buf_item_match+104>: xor %eax,%eax 0xffffffffa01e469a <xfs_trans_buf_item_match+106>: leaveq 0xffffffffa01e469b <xfs_trans_buf_item_match+107>: retq 0xffffffffa01e469c <xfs_trans_buf_item_match+108>: nopl 0x0(%rax)
通过对内存转储文件的分析,可以做到精确定位问题的目的,对于应用层日志没有效信息的情况比较有助于定位问题。