Controlling fan speed the right way
Problem statement
I’ve been using Noctua fans for a while now. And so far was reasonably happy with using the “low noise adapter”1 and setting the temperature/speed curve in BIOS.
But on Sunday I discovered that one of the fans actually stalls, and this is the second time the “low noise adapter” caused a fan stall. Previous instance was in our NAS, and the temperature delta was +10°C. Ouch.
Unfortunately, the BIOS menu for fan speed is far from fine-grained (allows control in 12.5% increments for some fans).
So I set to figure out what can be done on Linux side.
Exploration with a couple of dead ends
I’m somewhat familiar with lm-sensors
for temperature monitoring.
They’re far from perfect:
$ sensors | grep -e fan -e °
fan1: 0 RPM (min = 0 RPM)
fan2: 0 RPM (min = 0 RPM)
fan3: 347 RPM (min = 0 RPM)
fan4: 0 RPM (min = 0 RPM)
fan5: 616 RPM (min = 0 RPM)
SYSTIN: +34.0°C (high = +0.0°C, hyst = +0.0°C) ALARM sensor = CPU diode
CPUTIN: +36.5°C (high = +115.0°C, hyst = +120.0°C) sensor = thermistor
AUXTIN0: +41.5°C sensor = thermistor
AUXTIN1: +127.0°C sensor = thermistor
AUXTIN2: -128.0°C sensor = thermistor
AUXTIN3: -60.0°C sensor = thermal diode
PECI Agent 0: +34.5°C
PCH_CHIP_CPU_MAX_TEMP: +0.0°C
PCH_CHIP_TEMP: +0.0°C
PCH_CPU_TEMP: +0.0°C
temp1: +27.8°C (crit = +105.0°C)
temp2: +29.8°C (crit = +105.0°C)
Package id 0: +34.0°C (high = +80.0°C, crit = +100.0°C)
Core 0: +33.0°C (high = +80.0°C, crit = +100.0°C)
Core 1: +32.0°C (high = +80.0°C, crit = +100.0°C)
Core 2: +34.0°C (high = +80.0°C, crit = +100.0°C)
Core 3: +32.0°C (high = +80.0°C, crit = +100.0°C)
but at least with a short tweak of /etc/sensors.d/custom.conf
:
# Datasheet: http://www.nuvoton.com/resource-files/NCT6796D_Datasheet_V0_6.pdf
# P59 describes voltages and indexes
# It's similar to: https://gist.github.com/aaronsb/347d62b63456ae131916c3affd212c05
chip "nct6792-*"
ignore fan1
label fan2 "CPU fan"
label fan3 "Front case fan"
ignore fan4
label fan5 "Rear case fan"
ignore temp3 # auxtin0
ignore temp4 # auxtin1
ignore temp5 # auxtin2
ignore temp6 # auxtin3
ignore temp8 # PCH_CHIP_CPU_MAX_TEMP
ignore temp9 # PCH_CHIP_TEMP
ignore temp10 # PCH_CPU_TEMP
ignore intrusion0
ignore intrusion1
label in0 "CPUVCORE" # direct
#compute in0 @*2, @/2
label in1 "VIN1" # apparently -12V
# the datasheet says it should be this formula, but it gives bogus values (double)
#compute in1 (@-2.048)/(10/242)+2.048, (@-2.048)*(10/242)+2.048
label in2 "AVSB" # 1/2
label in3 "3VCC" # 1/2
label in4 "VIN0" # v0* (10k / (56k + 10k))
label in5 "VIN8"
label in6 "VIN4"
label in7 "3VSB" # 1/2
label in8 "VBAT" # 1/2
label in9 "VTT"
label in10 "VIN5"
label in11 "VIN6"
label in12 "VIN2"
label in13 "VIN3"
label in14 "VIN7"
they make a bit more sense:
$ sensors | grep -e fan -e °
CPU fan: 0 RPM (min = 0 RPM)
Front case fan: 340 RPM (min = 0 RPM)
Rear case fan: 611 RPM (min = 0 RPM)
SYSTIN: +34.0°C (high = +0.0°C, hyst = +0.0°C) ALARM sensor = CPU diode
CPUTIN: +36.5°C (high = +115.0°C, hyst = +120.0°C) sensor = thermistor
PECI Agent 0: +34.5°C
temp1: +27.8°C (crit = +105.0°C)
temp2: +29.8°C (crit = +105.0°C)
Package id 0: +35.0°C (high = +80.0°C, crit = +100.0°C)
Core 0: +32.0°C (high = +80.0°C, crit = +100.0°C)
Core 1: +31.0°C (high = +80.0°C, crit = +100.0°C)
Core 2: +35.0°C (high = +80.0°C, crit = +100.0°C)
Core 3: +32.0°C (high = +80.0°C, crit = +100.0°C)
Unfortunately, they don’t get me close to controlling the fans.
Searching a bit (apt search fan.*speed
), I discovered fancontrol
package.
The idea of fancontrol makes sense, we run a daemon that watches temperature and adjusts PWM setting (speed) of the fans.
I even managed to eke out a ~working config file:
$ cat /etc/fancontrol
# Configuration file generated by pwmconfig; changes will not be lost -- because
# I won't be re-running it anytime soon
INTERVAL=10
DEVPATH=hwmon0=devices/virtual/thermal/thermal_zone0 hwmon2=devices/platform/nct6775.2608
DEVNAME=hwmon0=acpitz hwmon2=nct6792
FCTEMPS=hwmon2/pwm5=hwmon2/temp1_input hwmon2/pwm2=hwmon2/temp2_input hwmon2/pwm3=hwmon2/temp1_input
FCFANS=hwmon2/pwm5=hwmon2/fan5_input hwmon2/pwm2=hwmon2/fan2_input hwmon2/pwm3=hwmon2/fan3_input
MINTEMP=hwmon2/pwm5=35 hwmon2/pwm2=35 hwmon2/pwm3=35
MAXTEMP=hwmon2/pwm5=50 hwmon2/pwm2=60 hwmon2/pwm3=50
MINSTART=hwmon2/pwm5=84 hwmon2/pwm2=22 hwmon2/pwm3=106
MINSTOP=hwmon2/pwm5=84 hwmon2/pwm2=22 hwmon2/pwm3=106
But there are three things bugging me about fancontrol:
- It’s a userspace daemon
- The config file format is downright dreadful2
- There are only two “points” (min, max) on the temperature/speed curve
Perhaps there is a better way?
The right way
After poking around /sys
directory, I realized that hwmon2
from fancontrol
‘s
config isn’t some random string.
$ sensors | grep -e ^Adapt -e ^[nac]
nct6792-isa-0a30
Adapter: ISA adapter
acpitz-acpi-0
Adapter: ACPI interface
coretemp-isa-0000
Adapter: ISA adapter
$ cat /sys/class/hwmon/hwmon*/name
acpitz
coretemp
nct6792
It is actually a subdirectory in /sys
filesystem.
And the hwmon sysfs documentation, and more specifically the nct6775 driver documentation3, provides more than enough rope to hang myself.
Notable are the pwm*
files:
$ cd /sys/class/hwmon/hwmon2/; ls pwm1*
pwm1 pwm1_auto_point3_temp pwm1_enable pwm1_stop_time
pwm1_auto_point1_pwm pwm1_auto_point4_pwm pwm1_floor pwm1_target_temp
pwm1_auto_point1_temp pwm1_auto_point4_temp pwm1_mode pwm1_temp_sel
pwm1_auto_point2_pwm pwm1_auto_point5_pwm pwm1_start pwm1_temp_tolerance
pwm1_auto_point2_temp pwm1_auto_point5_temp pwm1_step_down_time
pwm1_auto_point3_pwm pwm1_crit_temp_tolerance pwm1_step_up_time
These will play the stars of today’s show.
The theory of operation of the mode I intend to use is as follows (taken from the NCT6796D datasheet, from p. 75):
The list above says I have 5 points to configure. By default they’re:
$ cat pwm5_auto_point*_temp
40000
55000
70000
85000
101000
$ cat pwm5_auto_point*_pwm
127
127
178
255
255
…set to 50% @ 44°C, 50% @ 55°C, 70% @ 70°C, 100% @ 85°C, and 100% @ 101°C.
But since I’m not sure about the behavior below the first point, my first attempt4 is:
- 15000 = min_rpm
- 40000 = min_rpm
- 55000 = 50% (127)
- 85000 = 100% (255)
- 101000 = 100% (255)
I’m saying “min_rpm” on purpose, because – as is visible in the fancontrol
config – different fans have a different minimum duty cycle at which they
still spin.
And my goal is to keep them spinning at minimum (to lower the risk of stalling as they age) and only gradually ramp up with temperature.
Because what I find especially annoying is when the system reaches first endpoint and then cycles between turning the fan on and off. Steady quiet hum beats periodical on/off cycling.
And so /usr/local/sbin/set-fanspeed.sh
:
#!/bin/bash
# TODO(wejn): Make the hwmon path variable
HWMON=/sys/class/hwmon/hwmon2
if ! grep -q nct67 ${HWMON}/name; then
echo "Error: can't find proper hwmon, exit." 2>&1
exit 1
fi
set_fan(){
# Mode
echo 5 > ${HWMON}/pwm${1}_enable
# Temperatures
echo 15000 > ${HWMON}/pwm${1}_auto_point1_temp
echo 40000 > ${HWMON}/pwm${1}_auto_point2_temp
echo 55000 > ${HWMON}/pwm${1}_auto_point3_temp
echo 85000 > ${HWMON}/pwm${1}_auto_point4_temp
echo 101000 > ${HWMON}/pwm${1}_auto_point5_temp
# PWMs
echo $2 > ${HWMON}/pwm${1}_auto_point1_pwm
echo $2 > ${HWMON}/pwm${1}_auto_point2_pwm
echo 127 > ${HWMON}/pwm${1}_auto_point3_pwm
echo 255 > ${HWMON}/pwm${1}_auto_point4_pwm
echo 255 > ${HWMON}/pwm${1}_auto_point5_pwm
}
set_fan 2 22
set_fan 3 106
set_fan 5 84
exit 0
Yes, that easy5.
Closing words
And just like that… let there be quiet.
I’ll update the post if something changes. For now, it works well.
-
That’s
NA-RC9
in Noctua speak. ↩ -
For our sanity – when you’re writing a multi-value config – use sections (
*.ini
style); we had the technology since the ’90s. ↩ -
Confused yet? Turns out the
nct6775
module supports the Nuvoton 6792D Super I/O chip installed in my desktop’s motherboard. ↩ -
If it’s not working well, I’ll readjust later. Likely the value for 55°C, to flatten/steepen the curve. I mean… we’ve got 2 years worth of experience in the former, right? ↩
-
I mean, there’s also
test -x /usr/local/sbin/set-fanspeed.sh && /usr/local/sbin/set-fanspeed.sh
in/etc/rc.local
but that’s probably not even really worth mentioning. ↩