Nohang is one of many solutions to Linux's inadequate oom-killer. Its job is to cleanly terminate a runaway process before the system grinds to a halt, and the oom-killer eventually forcibly kills it anyway.
Homepage here: https://github.com/hakavlad/nohang
Their README also lists alternatives: https://github.com/hakavlad/nohang#solution
-
yum install nohang python3
-
The defaults are not bad (certainly better than nothing) but you may want to tweak them according to the needs of your system. See the Configuration section below.
-
Start the service, now and forever
systemctl enable --now nohang
Edit the config file in /etc/nohang/nohang.conf
This is optional, but I recommend disabling the print_statistics
option, because its output can be confusing.
nohang
will initially send a TERM
signal, but after 10 seconds, if memory is still low, it will send a KILL
signal. This can be problematic for sensitive services that cannot easily recover from being killed (e.g. blockchain nodes).
To reduce the need for the KILL
signal, we can give the process more time to terminate itself, or we can start the termination earlier:
-
Increase
max_soft_exit_time
from10
seconds to240
seconds (4 minutes) to give the process time to cleanly terminate itself. -
If it does not have enough memory to cleanly terminate (and is still getting killed), then increase
soft_threshold_min_mem
from5
to10
, to initiate the termination early, while there is still memory available.
If you want to automate the suggested configuration:
sed -i /etc/nohang/nohang.conf '
s/^print_statistics = .*/print_statistics = False/
s/^max_soft_exit_time = .*/max_soft_exit_time = 240/
s/^soft_threshold_min_mem = .*/soft_threshold_min_mem = 10 %/
'
To see the most recent times nohang took action:
journalctl -u nohang -n 2000 | grep -B20 'Sending .* victim'
This will ignore lines produced by print_statistics