Proper ZFS setup for peace of mind
Problem statement
I use ZFS on my storages. And while it is very powerful, I think that just plopping your data on ZFS and leaving it there without additional setup is a lot worse than it can be.
This article discusses a slightly more elaborate setup that provides
me with additional peace of mind – roughly in the sense that silent
corruption won’t fly by undetected, there’s some health monitoring, and
fat-fingering rm -rf something
has its blast radius attenuated1.
In other words:
- I’m quickly notified of failures
- ZFS is kept squeaky clean
- There’s reliable auto-snapshotting going on
Solution
The solution is going to be three-pronged, just like the problem statement.
I’m quickly notified of failures
The ZFS Event Daemon is a beauty.
It doesn’t come enabled by default, but it’s simply amazing: If there’s some
noteworthy event (failure, scrubbing/resilvering finished, etc) going on in the
life of your ZFS array, ZED will run bunch of preconfigured scripts ZEDLETs
and dispatch notifications (if it makes sense).
And you can also write your own scripts2 if that floats your boat…
The best part is, it’s ridiculously easy to have it push to Pushover or ntfy, in addition to root@’s email3 which is default.
So, enable ZED, and maybe make it go verbose by default (at first):
sed -i 's,^#*ZED_NOTIFY_VERBOSE=.*,ZED_NOTIFY_VERBOSE=1,' \
/etc/zfs/zed.d/zed.rc
/etc/init.d/zfs-zed start
rc-update add zfs-zed default
Also, configuring pushover is literally a two-liner:
sed -i 's,^#*ZED_PUSHOVER_TOKEN=.*,ZED_PUSHOVER_TOKEN=tokenhere,' \
/etc/zfs/zed.d/zed.rc
sed -i 's,^#*ZED_PUSHOVER_USER=.*,ZED_PUSHOVER_USER=userhere,' \
/etc/zfs/zed.d/zed.rc
ZFS is kept squeaky clean
One is supposed to run zpool scrub operation on the pools regularly, to make sure all data still checksums correctly.
I’ve seen weekly scrubbing recommended, which is super easy, barely an inconvenience:
cat - > /etc/periodic/weekly/zfs-scrub <<'EOF'
#!/bin/sh
for pool in $(zpool list -Ho name); do
zpool scrub "$pool"
done
EOF
chmod a+x /etc/periodic/weekly/zfs-scrub
Of course, you could complicate this a lot more… but I don’t mind running scrubs in parallel, at the default4 schedule.
There’s reliable auto-snapshotting going on
I’m super spoiled when it comes to online snapshots with ZFS. The thought of not having to worry about screw-ups even with fileops outside of a git repository is downright pleasant.
For that, I’ve adopted Dave Eddy’s Automatic ZFS Snapshots and Backups with a slight twist on it:
Since I run podman containers
with ZFS backing, I think having the tens of containers/.images/<hash>
also
snapshotted is a bit of an overkill. And I like them snapshots all at the ~same
point-in-time…
So for snapshotting, I adopted a slightly different strategy. Here’s my
zfs-snapshot-all
:
#!/bin/sh
if [ $# -ne 1 ]; then
echo "Usage: $0 <name>"
exit 1
fi
name="$1"
now=$(date +%s)
code=0
for pool in $(zpool list -Ho name); do
zfs snapshot -r ${pool}@${name}_${now} || code=1
# special case -- murder image snapshots
zfs list -t snapshot ${pool}/containers/.images@${name}_${now} >/dev/null 2>&1 &&
zfs destroy -r ${pool}/containers/.images@${name}_${now}
done
exit $code
In other words, I run a recursive snapshot on each pool, and then selectively
purge anything under ${pool}/containers/.images
, if present.
For snapshot pruning I also took a much simpler approach, gutting Dave’s
zfs-prune-snapshots.
Here’s my zfs-snapshots-prune
5:
#!/bin/bash
if [[ $# -lt 2 ]]; then
echo "Usage: $0 <snapshot_prefix> <timespec>" 2>&1
exit 1
fi
prefix="$1"
timespec="$2"
dryrun=""
time_re='^([0-9]+)([smhdwMy])$'
seconds=
if [[ $timespec =~ $time_re ]]; then
# ex: "21d" becomes num=21 spec=d
num=${BASH_REMATCH[1]}
spec=${BASH_REMATCH[2]}
case "$spec" in
s) seconds=$((num));;
m) seconds=$((num * 60));;
h) seconds=$((num * 60 * 60));;
d) seconds=$((num * 60 * 60 * 24));;
w) seconds=$((num * 60 * 60 * 24 * 7));;
M) seconds=$((num * 60 * 60 * 24 * 30));;
y) seconds=$((num * 60 * 60 * 24 * 365));;
*) echo "error: unknown spec '$spec'" >&2; exit 1;;
esac
elif [[ -z $timespec ]]; then
echo 'error: empty timespec' >&2
exit 2
else
echo "error: failed to parse timespec '$timespec'" >&2
exit 2
fi
code=0
now=$(date +%s)
zfs list -Hpo name,creation -t snapshot $(zpool list -Ho name) | \
while read -r snap creation; do
# filter prefix
[[ "$snap" =~ "@${prefix}_" ]] || continue
delta=$((now - creation))
if ((delta > seconds)); then
echo "Removing $snap, creation: $creation, now: $now, ts: $timespec"
zfs destroy -r $snap || code=3
else
#echo "Not ripe yet: $snap, creation: $creation, now: $now, ts: $timespec"
true
fi
done
exit $code
Unlike Dave’s much more generic solution, I opted for a simple decision on
the snapshot on pool level, followed by recursive zfs destroy -r
. Because
I don’t really care about du
stats or other niceties.
To run this without silent failures, I’m pushing failures of hourly snapshotting to Healthchecks.io, with an assumption that if the hourly snapping ain’t failing, neither are the weekly/monthly/yearly ones6:
cat - > /etc/periodic/hourly/zfs-snapshot <<'EOF'
#!/bin/sh
exec >> /var/log/zfs-snapshots.log 2>&1
code=0
if zfs-snapshot-all hourly; then
zfs-snapshots-prune hourly 25h || code=2
else
code=1
fi
# FIXME: change the UUID
curl -fsS -m 10 --retry 5 -o /dev/null \
https://hc-ping.com/FIXME-uuid-here/$code
exit $code
EOF
cat - > /etc/periodic/daily/zfs-snapshot <<'EOF'
#!/bin/sh
exec >> /var/log/zfs-snapshots.log 2>&1
set -e
zfs-snapshot-all daily
zfs-snapshots-prune daily 8d
EOF
cat - > /etc/periodic/weekly/zfs-snapshot <<'EOF'
#!/bin/sh
exec >> /var/log/zfs-snapshots.log 2>&1
set -e
zfs-snapshot-all weekly
zfs-snapshots-prune weekly 5w
EOF
cat - > /etc/periodic/monthly/zfs-snapshot <<'EOF'
#!/bin/sh
exec >> /var/log/zfs-snapshots.log 2>&1
set -e
zfs-snapshot-all monthly
zfs-snapshots-prune monthly 13M
# also handles yearly
if [ $(date +%m) -eq 1 ]; then
zfs-snapshot-all yearly
zfs-snapshots-prune yearly 10y
fi
EOF
chmod a+x /etc/periodic/*/zfs-snapshot
Closing words
There you have it, a somewhat automatic ZFS setup7 for increased peace of mind, in three easy pieces.
-
Compared to conventional systems. Yes, you surely have proper backups.
But those can be, depending on schedule, strictly worse. Definitely more painful to handle than justcd .zfs/snapshot/$name/
, no? ↩ -
Say, to monitor scrubbing finishes regularly without silent fail… ↩
-
Linked is my take on painless send-only email setup. ↩
-
3am on Saturday ↩
-
apk add bash
if you wanna run it on Alpine ↩ -
Time will tell how smart of a shortcut this one was. ↩
-
And without any trace of backups. You shall have them; 3-2-1 or whatever. I’m a side note, not a cop. ↩