Understanding OOM killer logs

sergeyz asked:

I run some processes inside docker container and I use memory limitation for this container. Sometimes some processes inside docker container got killed by OOM killer. I see that in syslog file:

beam.smp invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0
beam.smp cpuset=/ mems_allowed=0
CPU: 0 PID: 20908 Comm: beam.smp Not tainted 3.13.0-36-generic #63~precise1-Ubuntu
Hardware name: Xen HVM domU, BIOS 4.2.amazon 05/23/2014
 ffff880192ca6c00 ffff880117ebfbe8 ffffffff817557fe 0000000000000007
 ffff8800ea1e9800 ffff880117ebfc38 ffffffff8174b5b9 ffff880100000000
 000000d08137dd08 ffff880117ebfc38 ffff88010c05e000 0000000000000000
Call Trace:
 [<ffffffff817557fe>] dump_stack+0x46/0x58
 [<ffffffff8174b5b9>] dump_header+0x7e/0xbd
 [<ffffffff8174b64f>] oom_kill_process.part.5+0x57/0x2d4
 [<ffffffff81075295>] ? has_ns_capability_noaudit+0x15/0x20
 [<ffffffff8115b709>] ? oom_badness.part.4+0xa9/0x140
 [<ffffffff8115ba27>] oom_kill_process+0x47/0x50
 [<ffffffff811bee4c>] mem_cgroup_out_of_memory+0x28c/0x2b0
 [<ffffffff811c122b>] mem_cgroup_oom_synchronize+0x23b/0x270
 [<ffffffff811c0ac0>] ? memcg_charge_kmem+0xf0/0xf0
 [<ffffffff8115be08>] pagefault_out_of_memory+0x18/0x90
 [<ffffffff81747e91>] mm_fault_error+0xb9/0xd3
 [<ffffffff81766267>] ? __do_page_fault+0x317/0x570
 [<ffffffff81766495>] __do_page_fault+0x545/0x570
 [<ffffffff8101361d>] ? __switch_to+0x16d/0x4d0
 [<ffffffff810a5d3d>] ? set_next_entity+0xad/0xd0
 [<ffffffff8175df1e>] ? __schedule+0x38e/0x700
 [<ffffffff817664da>] do_page_fault+0x1a/0x70
 [<ffffffff81762648>] page_fault+0x28/0x30
Task in /docker/a4d47fb7bbc8a2bbc172bd26085c4509364b1b7eec61439669e08e281b181a0b killed as a result of limit of /docker/a4d47fb7bbc8a2bbc172bd26085c4509364b1b7eec61439669e08e281b181a0b
memory: usage 229600kB, limit 262144kB, failcnt 5148
memory+swap: usage 524288kB, limit 524288kB, failcnt 19118
kmem: usage 0kB, limit 18014398509481983kB, failcnt 0
Memory cgroup stats for /docker/a4d47fb7bbc8a2bbc172bd26085c4509364b1b7eec61439669e08e281b181a0b: cache:0KB rss:229600KB rss_huge:8192KB mapped_file:0KB writeback:3336KB swap:294688KB inactive_anon:114980KB active_anon:114620KB inactive_file:0KB active_file:0KB unevictable:0KB
[ pid ]   uid  tgid total_vm      rss nr_ptes swapents oom_score_adj name
[ 9537]     0  9537     8740      712      21     1041             0 my_init
[13097]     0 13097       48        3       3       16             0 runsvdir
[13098]     0 13098       42        4       3       19             0 runsv
[13100]     0 13100       42        4       3       38             0 runsv
[13101]     0 13101       42        4       3       17             0 runsv
[13102]     0 13102       42        4       3        4             0 runsv
[13103]     0 13103       42        4       3       39             0 runsv
[13104]     0 13104     4779      243      15       60             0 cron
[13105]     0 13105     8591      601      22     1129             0 ruby
[13107]     0 13107    20478      756      43      560             0 syslog-ng
[13108]     0 13108    11991      642      28     1422             0 ruby
[20826]     0 20826     4467      249      14       63             0 run
[20827]     0 20827     1101      144       8       29             0 huobi
[20878]     0 20878     3708      172      13       48             0 run_erl
[20879]     0 20879   249481    57945     321    72955             0 beam.smp
[20969]     0 20969     1846       83       9       27             0 inet_gethost
[20970]     0 20970     3431      173      12       33             0 inet_gethost
[20977]     0 20977     1101      127       8       25             0 sh
[20978]     0 20978     1074      125       8       23             0 memsup
[20979]     0 20979     1074       68       7       23             0 cpu_sup
[ 5446]     0  5446     8462      217      22       81             0 cron
[ 5451]     0  5451     1101      127       8       26             0 sh
[ 5453]     0  5453     1078       68       8       22             0 sleep
[10898]     0 10898     8462      217      22       81             0 cron
[10899]     0 10899     8462      216      22       80             0 cron
[10900]     0 10900     1101      127       7       26             0 sh
[10901]     0 10901     1101      127       8       25             0 sh
[10902]     0 10902     1078       68       7       22             0 sleep
[10903]     0 10903     1078       68       8       22             0 sleep
Memory cgroup out of memory: Kill process 20911 (beam.smp) score 1001 or sacrifice child
Killed process 20977 (sh) total-vm:4404kB, anon-rss:0kB, file-rss:508kB

I know that beam.smp process consumes memory resources very aggressively. So the very first line of log beam.smp invoked oom-killer does make sense.

But I’m confused about last 2 lines of log. It says Kill process 20911 (beam.smp), but processs with PID 20911 does not exist inside this cgroup (list of processes dumped to log too). And last line says Killed process 20977 (sh) (and this PID presents in cgroup).
We were about to kill beam.smp, but finally killed sh. What does it mean?

My answer:


The OOM killer decided to kill another process.

The message did state:

Kill process 20911 .... or sacrifice child

It decided to kill the child with pid 20977, a shell script that was spawned by the process.

If you want Linux to always kill the task which caused the out of memory condition, set the sysctl vm.oom_kill_allocating_task to 1.


From the kernel documentation:

This enables or disables killing the OOM-triggering task in
out-of-memory situations.

If this is set to zero, the OOM killer will scan through the entire
tasklist and select a task based on heuristics to kill. This normally
selects a rogue memory-hogging task that frees up a large amount of
memory when killed.

If this is set to non-zero, the OOM killer simply kills the task that
triggered the out-of-memory condition. This avoids the expensive
tasklist scan.

If panic_on_oom is selected, it takes precedence over whatever value
is used in oom_kill_allocating_task.

The default value is 0.


View the full question and answer on Server Fault.

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.