博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
docker内存和cpu调试
阅读量:7092 次
发布时间:2019-06-28

本文共 9096 字,大约阅读时间需要 30 分钟。

 本地启动了一个sshd的容器服务,但该容器经常会被重启导致ssh连接失败,使用kubectl describe pod命令查看改命令发现有容器返回值为137,一般是系统环境原因,且一般为内存不足导致的,参见:

Started:      Tue, 20 Nov 2018 12:14:42 +0800Last State:   TerminatedReason:       ErrorExit Code:    137Started:      Mon, 19 Nov 2018 14:18:22 +0800Finished:     Tue, 20 Nov 2018 12:14:16 +0800
  • 登陆该容器的node节点,查看系统日志发现sshd申请内存严重超时,且看到normal ZONE中的free<min,这种情况下会触发内核杀死进程回收内存,可能会导致sshd容器或containerd进程重启。min的值由vm.min_free_kbytes设置,原理参见
Nov 20 04:02:36 ip-172-20-54-91 kernel: [91374.975004] sshd: page allocation stalls for 20388ms, order:0, mode:0x24200ca(GFP_HIGHUSER_MOVABLE)Nov 20 04:02:36 ip-172-20-54-91 kernel: [91374.984454] CPU: 3 PID: 1257 Comm: sshd Not tainted 4.9.0-7-amd64 #1 Debian 4.9.110-3+deb9u2Nov 20 04:02:36 ip-172-20-54-91 kernel: [91374.988477] Hardware name: Xen HVM domU, BIOS 4.2.amazon 08/24/2006Nov 20 04:02:36 ip-172-20-54-91 kernel: [91374.995081]  0000000000000000 ffffffff90d30694 ffffffff91401218 ffffb76e46b5fb60Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.004170]  ffffffff90b89d0a 024200ca00000006 ffffffff91401218 ffffb76e46b5fb00Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.004170]  ffff8b5300000010 ffffb76e46b5fb70 ffffb76e46b5fb20 286e078452d92816Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.004170] Call Trace:Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.004170]  [
] ? dump_stack+0x5c/0x78Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.026383] [
] ? warn_alloc+0x13a/0x160Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.026383] [
] ? __alloc_pages_slowpath+0x995/0xbf0Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.026383] [
] ? __schedule+0x241/0x6f0Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.026383] [
] ? xen_clocksource_get_cycles+0x11/0x20Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.026383] [
] ? ktime_get+0x3e/0xb0Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.026383] [
] ? __alloc_pages_nodemask+0x201/0x260Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.026383] [
] ? alloc_pages_vma+0xae/0x260Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.026383] [
] ? wp_page_copy+0x89/0x700Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.026383] [
] ? do_wp_page+0x161/0x7e0Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.026383] [
] ? page_add_file_rmap+0x11/0x110Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.026383] [
] ? alloc_set_pte+0x3c2/0x550Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.026383] [
] ? handle_mm_fault+0x832/0x1280Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.026383] [
] ? __do_page_fault+0x255/0x4f0Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.026383] [
] ? page_fault+0x28/0x30Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.110910] Mem-Info:Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.113701] active_anon:3868989 inactive_anon:1176 isolated_anon:0Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.113701] active_file:23607 inactive_file:21078 isolated_file:720Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.113701] unevictable:0 dirty:8 writeback:0 unstable:0Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.113701] slab_reclaimable:16746 slab_unreclaimable:57137Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.113701] mapped:38107 shmem:3568 pagetables:20708 bounce:0Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.113701] free:33020 free_pcp:370 free_cma:0Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.143413] Node 0 active_anon:15475956kB inactive_anon:4704kB active_file:94428kB inactive_file:84312kB unevictable:0kB isolated(anon):0kB isolated(file):2880kB mapped:152428kB dirty:32kB writeback:0kB shmem:14272kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 135168kB writeback_tmp:0kB unstable:0kB pages_scanned:50 all_unreclaimable? noNov 20 04:02:36 ip-172-20-54-91 kernel: [91375.169765] Node 0 DMA free:15904kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15988kB managed:15904kB mlocked:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kBNov 20 04:02:36 ip-172-20-54-91 kernel: [91375.195526] lowmem_reserve[]: 0 3741 16011 16011 16011Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.201354] Node 0 DMA32 free:64640kB min:15776kB low:19720kB high:23664kB active_anon:3624584kB inactive_anon:144kB active_file:17908kB inactive_file:15952kB unevictable:0kB writepending:0kB present:3915776kB managed:3850208kB mlocked:0kB slab_reclaimable:11200kB slab_unreclaimable:45456kB kernel_stack:12508kB pagetables:16536kB bounce:0kB free_pcp:508kB local_pcp:0kB free_cma:0kBNov 20 04:02:36 ip-172-20-54-91 kernel: [91375.230119] lowmem_reserve[]: 0 0 12270 12270 12270Nov 20 04:02:36 ip-172-20-54-91 kernel: [91375.235036] Node 0 Normal free:51536kB min:51740kB low:64672kB high:77604kB active_anon:11851568kB inactive_anon:4560kB active_file:76636kB inactive_file:68596kB unevictable:0kB writepending:32kB present:12845056kB managed:12569296kB mlocked:0kB slab_reclaimable:55784kB slab_unreclaimable:183092kB kernel_stack:47380kB pagetables:66296kB bounce:0kB free_pcp:1056kB local_pcp:0kB free_cma:0kB
  • 同时可以参见/proc/meminfo中的MemTotal和SwapTotal,可以发现内存已经不足,也可以查看/proc/buddyinfo文件,查看剩余连续内存的分布,小内存比较多时说明内存碎片化比较严重

附:使用perf进行cpu占用率进行分析

  • 如下代码中,函数AA死循环,预期会占用大量CPU资源
#include
#include
void AA(){ int i=0; while(1){ i++; }}void BB(){ printf("BB\n");}int main(){ BB(); AA();}
  • 首先使用top命令查看cpu占用率,可以看出用户空间cpu占用率达到了50%,而内核空间很低,可以看出cpu占用率主要在用户态,涉及系统调用比较少
%Cpu(s): 50.0 us,  8.3 sy,  0.0 ni, 41.7 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
  • 使用perf top,部分结果如下,可以看到一个名为test的进程,同时可以看到一个名为AA的符号,该符号就是名为AA的函数
Samples: 699K of event 'cpu-clock', Event count (approx.): 8217702565                                                                                                                     Overhead  Shared Object                  Symbol                                                                                                                                             99.68%  test                           [.] AA                                                                                                                                              0.12%  [kernel]                       [k] _raw_spin_unlock_irqrestore                                                                                                                     0.06%  [kernel]                       [k] __do_softirq                                                                                                                                    0.02%  [kernel]                       [k] e1000_xmit_frame                                                                                                                                0.01%  libc-2.17.so                   [.] _int_malloc                                                                                                                                     0.00%  [kernel]                       [k] clear_page                                                                                                                                      0.00%  libvmtools.so.0.0.0            [.] Backdoor_InOut                                                                                                                                  0.00%  [kernel]                       [k] kstat_irqs
  • 使用perf record记录下10s以内的cpu 处理器时钟使用情况,通过perf report可以看到占用率高的进程的调用栈
perf record -a -e cycles -o cycle.perf -g sleep 10
# perf report -i cycle.perf|more# To display the perf.data header info, please use --header/--header-only options.### Total Lost Samples: 0## Samples: 22K of event 'cpu-clock'# Event count (approx.): 5736750000## Children      Self  Command          Shared Object        Symbol                                                                    # ........  ........  ...............  ...................  ..........................................................................#    88.97%     0.00%  test             libc-2.17.so         [.] __libc_start_main            |            ---__libc_start_main               main               AA    88.97%     0.00%  test             test                 [.] main            |            ---main               AA    88.97%    88.88%  test             test                 [.] AA            |                       --88.88%--__libc_start_main                       main                       AA

 

TIPS:

  • perf工具依赖ELF文件的debug段信息,如果调试到一个stripped binary时因为无法解析会打印16进制信息。
  • perf工具无法兼容编译时解析被 -fomit-frame-pointer优化的程序,参见

参见:

https://utcc.utoronto.ca/~cks/space/blog/linux/DecodingPageAllocFailures

https://www.cnblogs.com/004x/p/6651600.htm

http://tldp.org/LDP/Linux-Filesystem-Hierarchy/html/proc.html

https://utcc.utoronto.ca/~cks/space/blog/linux/KernelMemoryZones

https://blog.csdn.net/lickylin/article/details/50726847

http://www.10tiao.com/html/497/201606/2456160252/1.html

https://www.kernel.org/doc/Documentation/filesystems/proc.txt

转载于:https://www.cnblogs.com/charlieroro/p/10064022.html

你可能感兴趣的文章
properties 配置文件中值换行的问题
查看>>
Azure 部署 Asp.NET Core Web App
查看>>
Masonry和FDTemplateLayoutCell 结合使用示例Demo
查看>>
linux 切换用户之后变成-bash-x.x$的解决方法
查看>>
用备份控制文件做不完全恢复下的完全恢复(数据文件备份<旧>--新建表空间--控制文件备份<次新>--日志归档文件<新>)...
查看>>
python下RSA加密解密以及跨平台问题
查看>>
详解Java Spring各种依赖注入注解的区别
查看>>
android 区分wifi是5G还是2.4G(转)
查看>>
多个构造器参数使用构建器
查看>>
模板方法模式(Template Method)
查看>>
创建预编译头 Debug 正常 Release Link Error:预编译头已存在,使用第一个 PCH
查看>>
asp.net上传文件夹权限配置以及权限配置的分析
查看>>
IPC's epoch 6 is less than the last promised epoch 7
查看>>
C语言 · 寂寞的数
查看>>
android Menu 笔记
查看>>
Apache2.2和Apache2.4中httpd.conf配置文件 权限的异同
查看>>
error:Flash Download failed-“Cortex-M3”,“Programming Algorithm”【转】
查看>>
小tips:JS之break,continue和return这三个语句的用法
查看>>
【Java】Java_09 类型转换
查看>>
AndroidStudio gradle配置
查看>>