Controlling fan speed the right way


Problem statement

I’ve been using Noctua fans for a while now. And so far was reasonably happy with using the “low noise adapter”1 and setting the temperature/speed curve in BIOS.

But on Sunday I discovered that one of the fans actually stalls, and this is the second time the “low noise adapter” caused a fan stall. Previous instance was in our NAS, and the temperature delta was +10°C. Ouch.

Unfortunately, the BIOS menu for fan speed is far from fine-grained (allows control in 12.5% increments for some fans).

So I set to figure out what can be done on Linux side.

Exploration with a couple of dead ends

I’m somewhat familiar with lm-sensors for temperature monitoring.

They’re far from perfect:

$ sensors | grep -e fan -e °
fan1:                     0 RPM  (min =    0 RPM)
fan2:                     0 RPM  (min =    0 RPM)
fan3:                   347 RPM  (min =    0 RPM)
fan4:                     0 RPM  (min =    0 RPM)
fan5:                   616 RPM  (min =    0 RPM)
SYSTIN:                 +34.0°C  (high =  +0.0°C, hyst =  +0.0°C)  ALARM  sensor = CPU diode
CPUTIN:                 +36.5°C  (high = +115.0°C, hyst = +120.0°C)  sensor = thermistor
AUXTIN0:                +41.5°C    sensor = thermistor
AUXTIN1:               +127.0°C    sensor = thermistor
AUXTIN2:               -128.0°C    sensor = thermistor
AUXTIN3:                -60.0°C    sensor = thermal diode
PECI Agent 0:           +34.5°C  
PCH_CHIP_CPU_MAX_TEMP:   +0.0°C  
PCH_CHIP_TEMP:           +0.0°C  
PCH_CPU_TEMP:            +0.0°C  
temp1:        +27.8°C  (crit = +105.0°C)
temp2:        +29.8°C  (crit = +105.0°C)
Package id 0:  +34.0°C  (high = +80.0°C, crit = +100.0°C)
Core 0:        +33.0°C  (high = +80.0°C, crit = +100.0°C)
Core 1:        +32.0°C  (high = +80.0°C, crit = +100.0°C)
Core 2:        +34.0°C  (high = +80.0°C, crit = +100.0°C)
Core 3:        +32.0°C  (high = +80.0°C, crit = +100.0°C)

but at least with a short tweak of /etc/sensors.d/custom.conf:

# Datasheet: http://www.nuvoton.com/resource-files/NCT6796D_Datasheet_V0_6.pdf
# P59 describes voltages and indexes
# It's similar to: https://gist.github.com/aaronsb/347d62b63456ae131916c3affd212c05

chip "nct6792-*"

ignore fan1
label fan2 "CPU fan"
label fan3 "Front case fan"
ignore fan4
label fan5 "Rear case fan"
ignore temp3 # auxtin0
ignore temp4 # auxtin1
ignore temp5 # auxtin2
ignore temp6 # auxtin3
ignore temp8 # PCH_CHIP_CPU_MAX_TEMP
ignore temp9 # PCH_CHIP_TEMP
ignore temp10 # PCH_CPU_TEMP

ignore intrusion0
ignore intrusion1

label in0 "CPUVCORE" # direct
#compute in0 @*2, @/2
label in1 "VIN1" # apparently -12V
# the datasheet says it should be this formula, but it gives bogus values (double)
#compute in1 (@-2.048)/(10/242)+2.048, (@-2.048)*(10/242)+2.048
label in2 "AVSB" # 1/2
label in3 "3VCC" # 1/2
label in4 "VIN0" # v0* (10k / (56k + 10k))
label in5 "VIN8"
label in6 "VIN4"
label in7 "3VSB" # 1/2
label in8 "VBAT" # 1/2
label in9 "VTT"
label in10 "VIN5"
label in11 "VIN6"
label in12 "VIN2"
label in13 "VIN3"
label in14 "VIN7"

they make a bit more sense:

$ sensors | grep -e fan -e °
CPU fan:           0 RPM  (min =    0 RPM)
Front case fan:  340 RPM  (min =    0 RPM)
Rear case fan:   611 RPM  (min =    0 RPM)
SYSTIN:          +34.0°C  (high =  +0.0°C, hyst =  +0.0°C)  ALARM  sensor = CPU diode
CPUTIN:          +36.5°C  (high = +115.0°C, hyst = +120.0°C)  sensor = thermistor
PECI Agent 0:    +34.5°C  
temp1:        +27.8°C  (crit = +105.0°C)
temp2:        +29.8°C  (crit = +105.0°C)
Package id 0:  +35.0°C  (high = +80.0°C, crit = +100.0°C)
Core 0:        +32.0°C  (high = +80.0°C, crit = +100.0°C)
Core 1:        +31.0°C  (high = +80.0°C, crit = +100.0°C)
Core 2:        +35.0°C  (high = +80.0°C, crit = +100.0°C)
Core 3:        +32.0°C  (high = +80.0°C, crit = +100.0°C)

Unfortunately, they don’t get me close to controlling the fans.

Searching a bit (apt search fan.*speed), I discovered fancontrol package.

The idea of fancontrol makes sense, we run a daemon that watches temperature and adjusts PWM setting (speed) of the fans.

I even managed to eke out a ~working config file:

$ cat /etc/fancontrol
# Configuration file generated by pwmconfig; changes will not be lost -- because
# I won't be re-running it anytime soon
INTERVAL=10
DEVPATH=hwmon0=devices/virtual/thermal/thermal_zone0 hwmon2=devices/platform/nct6775.2608
DEVNAME=hwmon0=acpitz hwmon2=nct6792
FCTEMPS=hwmon2/pwm5=hwmon2/temp1_input hwmon2/pwm2=hwmon2/temp2_input hwmon2/pwm3=hwmon2/temp1_input
FCFANS=hwmon2/pwm5=hwmon2/fan5_input hwmon2/pwm2=hwmon2/fan2_input hwmon2/pwm3=hwmon2/fan3_input
MINTEMP=hwmon2/pwm5=35 hwmon2/pwm2=35 hwmon2/pwm3=35
MAXTEMP=hwmon2/pwm5=50 hwmon2/pwm2=60 hwmon2/pwm3=50
MINSTART=hwmon2/pwm5=84 hwmon2/pwm2=22 hwmon2/pwm3=106
MINSTOP=hwmon2/pwm5=84 hwmon2/pwm2=22 hwmon2/pwm3=106

But there are three things bugging me about fancontrol:

  1. It’s a userspace daemon
  2. The config file format is downright dreadful2
  3. There are only two “points” (min, max) on the temperature/speed curve

Perhaps there is a better way?

The right way

After poking around /sys directory, I realized that hwmon2 from fancontrol‘s config isn’t some random string.

$ sensors | grep -e ^Adapt -e ^[nac]
nct6792-isa-0a30
Adapter: ISA adapter
acpitz-acpi-0
Adapter: ACPI interface
coretemp-isa-0000
Adapter: ISA adapter
$ cat /sys/class/hwmon/hwmon*/name
acpitz
coretemp
nct6792

It is actually a subdirectory in /sys filesystem.

And the hwmon sysfs documentation, and more specifically the nct6775 driver documentation3, provides more than enough rope to hang myself.

Notable are the pwm* files:

$ cd /sys/class/hwmon/hwmon2/; ls pwm1*
pwm1                   pwm1_auto_point3_temp     pwm1_enable          pwm1_stop_time
pwm1_auto_point1_pwm   pwm1_auto_point4_pwm      pwm1_floor           pwm1_target_temp
pwm1_auto_point1_temp  pwm1_auto_point4_temp     pwm1_mode            pwm1_temp_sel
pwm1_auto_point2_pwm   pwm1_auto_point5_pwm      pwm1_start           pwm1_temp_tolerance
pwm1_auto_point2_temp  pwm1_auto_point5_temp     pwm1_step_down_time
pwm1_auto_point3_pwm   pwm1_crit_temp_tolerance  pwm1_step_up_time

These will play the stars of today’s show.

The theory of operation of the mode I intend to use is as follows (taken from the NCT6796D datasheet, from p. 75):

Smart Fan IV

The list above says I have 5 points to configure. By default they’re:

$ cat pwm5_auto_point*_temp
40000
55000
70000
85000
101000
$ cat pwm5_auto_point*_pwm
127
127
178
255
255

…set to 50% @ 44°C, 50% @ 55°C, 70% @ 70°C, 100% @ 85°C, and 100% @ 101°C.

But since I’m not sure about the behavior below the first point, my first attempt4 is:

I’m saying “min_rpm” on purpose, because – as is visible in the fancontrol config – different fans have a different minimum duty cycle at which they still spin.

And my goal is to keep them spinning at minimum (to lower the risk of stalling as they age) and only gradually ramp up with temperature.

Because what I find especially annoying is when the system reaches first endpoint and then cycles between turning the fan on and off. Steady quiet hum beats periodical on/off cycling.

And so /usr/local/sbin/set-fanspeed.sh:

#!/bin/bash

# TODO(wejn): Make the hwmon path variable
HWMON=/sys/class/hwmon/hwmon2

if ! grep -q nct67 ${HWMON}/name; then
    echo "Error: can't find proper hwmon, exit." 2>&1
    exit 1
fi

set_fan(){
    # Mode
    echo 5 > ${HWMON}/pwm${1}_enable
    # Temperatures
    echo 15000 > ${HWMON}/pwm${1}_auto_point1_temp
    echo 40000 > ${HWMON}/pwm${1}_auto_point2_temp
    echo 55000 > ${HWMON}/pwm${1}_auto_point3_temp
    echo 85000 > ${HWMON}/pwm${1}_auto_point4_temp
    echo 101000 > ${HWMON}/pwm${1}_auto_point5_temp
    # PWMs
    echo $2 > ${HWMON}/pwm${1}_auto_point1_pwm
    echo $2 > ${HWMON}/pwm${1}_auto_point2_pwm
    echo 127 > ${HWMON}/pwm${1}_auto_point3_pwm
    echo 255 > ${HWMON}/pwm${1}_auto_point4_pwm
    echo 255 > ${HWMON}/pwm${1}_auto_point5_pwm
}

set_fan 2 22
set_fan 3 106
set_fan 5 84

exit 0

Yes, that easy5.

Closing words

And just like that… let there be quiet.

I’ll update the post if something changes. For now, it works well.

  1. That’s NA-RC9 in Noctua speak.

  2. For our sanity – when you’re writing a multi-value config – use sections (*.ini style); we had the technology since the ’90s.

  3. Confused yet? Turns out the nct6775 module supports the Nuvoton 6792D Super I/O chip installed in my desktop’s motherboard.

  4. If it’s not working well, I’ll readjust later. Likely the value for 55°C, to flatten/steepen the curve. I mean… we’ve got 2 years worth of experience in the former, right?

  5. I mean, there’s also test -x /usr/local/sbin/set-fanspeed.sh && /usr/local/sbin/set-fanspeed.sh in /etc/rc.local but that’s probably not even really worth mentioning.