How to call renameat2 syscall in Ruby
Problem statement
If your knowledge of Linux internals isn’t 100% current, you might be surprised
(like I was a while back) that there’s now an easy way for the age-old problem
of atomically renaming (swapping) directories. It’s called renameat2
.
Trouble is that even though renameat2
syscall isn’t exactly new (it was
introduced in Linux kernel 3.15 and added to glibc 2.28),
there isn’t a commandline-accessible support for it (yet).
In this post I’ll explore what it takes to make it accessible in Ruby1.
Background
But first a bit of a background: What is one of the many problems that
renameat2
solves?
Imagine you have a directory tree. For example a website root. And your website is really popular. Constantly accessed by hundreds of visitors.
How do you perform an update of a bunch of articles without the poor visitors getting partial files back? Especially if the HTML and CSS/JS assets go together.
In more technical terms: how do you atomically update a directory tree?
For files themselves it’s quite easy. The rename
syscall2 –
rename(old, new)
– will atomically replace new
path with old
if new
already exists3.
But for directories rename
doesn’t work, unless the new path is empty.
And thus the tried and true trick is to add a level of indirection4 and instead of using a directory as your (website) directory root, you use a symlink to some directory. Then, the atomic switch can be performed by renaming a temp symlink pointing to the new directory.
Allow me to demonstrate:
#!/usr/bin/env ruby
# Cleanup at exit
trap('EXIT') do
%w[webroot v1 v2 webroot].map do |e|
begin
File.lstat(e).directory? ? Dir.rmdir(e) : File.unlink(e)
rescue Object
end
end
end
# Setup
Dir.mkdir('v1')
Dir.mkdir('v2')
File.symlink('v1', 'webroot')
# We're at v1:
File.readlink('webroot') # => "v1"
puts "Pre:"
system('ls -l')
puts
# Switch to v2:
File.symlink('v2', 'tmp')
File.rename('tmp', 'webroot')
# We're at v2:
File.readlink('webroot') # => "v2"
puts "Post:"
system('ls -l')
which executes down to:
$ ruby rename.rb
Pre:
total 12
-rw-r--r-- 1 wejn wejn 499 Dec 10 19:14 rename.rb
drwxr-xr-x 2 wejn wejn 4096 Dec 10 19:14 v1
drwxr-xr-x 2 wejn wejn 4096 Dec 10 19:14 v2
lrwxrwxrwx 1 wejn wejn 2 Dec 10 19:14 webroot -> v1
Post:
total 12
-rw-r--r-- 1 wejn wejn 499 Dec 10 19:14 rename.rb
drwxr-xr-x 2 wejn wejn 4096 Dec 10 19:14 v1
drwxr-xr-x 2 wejn wejn 4096 Dec 10 19:14 v2
lrwxrwxrwx 1 wejn wejn 2 Dec 10 19:14 webroot -> v2
But I find this indirection unwelcome, even though it’s used far and wide5.
And that’s where renameat2
comes into play. With the RENAME_EXCHANGE
flag
it allows to swap two directories atomically.
Solution
Looking at the man page6
renameat2
takes the form:
int renameat2(int olddirfd, const char *oldpath,
int newdirfd, const char *newpath, unsigned int flags);
And yes, we can quite easily do the syscall
dance in Ruby:
d = Dir.open('/path/to/base')
NR_renameat2 = 316 # from /usr/include/x86_64-linux-gnu/bits/syscall.h
RENAME_EXCHANGE = 1<<1 # from /usr/include/linux/fs.h
syscall(NR_renameat2, d.fileno, 'a', d.fileno, 'b', RENAME_EXCHANGE)
But if this gets shipped to production, there’s a painful surprise waiting for us down the road (e.g. when we switch between platforms) as the syscall numbers aren’t stable.
Fortunately Ruby 2.5 comes with a convenient wrapper to call native functions
(libffi
).
Which brings me to partial7 solution for the renameat2
problem in Ruby:
#!/usr/bin/env ruby
require 'fiddle'
# Cleanup at exit
require 'fileutils'
trap('EXIT') do
%w[v0 v1 v2 webroot].map do |e|
begin
FileUtils.rm_rf(e)
rescue Object
end
end
end
# This is where the magic is defined
libc = Fiddle.dlopen('/lib/x86_64-linux-gnu/libc.so.6')
# TODO(wejn): Figure out how not to hardcode libc path...
renameat2 = Fiddle::Function.new(
libc['renameat2'],
[
Fiddle::TYPE_INT, # olddirfd
Fiddle::TYPE_VOIDP, # oldpath
Fiddle::TYPE_INT, # newdirfd
Fiddle::TYPE_VOIDP, # newpath
Fiddle::TYPE_INT, # flags
],
Fiddle::TYPE_INT)
RENAME_EXCHANGE = 1<<1 # from /usr/include/linux/fs.h, less likely to change
# TODO(wejn): Figure out how not to hardcode the constant...
# Setup
%w[v1 v2 webroot].each { |d| Dir.mkdir(d) }
File.write('webroot/version.txt', "initial")
File.write('v1/version.txt', "first")
File.write('v2/version.txt', "second")
d = Dir.open('.') # technically a path to base
show = lambda do |label|
puts "#{label}:"
Dir["**/version.txt"].sort.each do |e|
puts "#{e}: #{File.read(e)}"
end
puts
end
# Initial state:
show["Initial"]
# Upgrade to v1:
renameat2.call(d.fileno, 'v1', d.fileno, 'webroot', RENAME_EXCHANGE)
File.rename('v1', 'v0') # careful: v1 and webroot were switched...
show["Switch to v1"]
# Upgrade to v2:
renameat2.call(d.fileno, 'v2', d.fileno, 'webroot', RENAME_EXCHANGE)
File.rename('v2', 'v1') # careful: v1 and webroot were switched...
show["Switch to v2"]
# Final state:
show["Final"]
which executed ends up looking like this:
$ ruby renameat2.rb
Initial:
v1/version.txt: first
v2/version.txt: second
webroot/version.txt: initial
Switch to v1:
v0/version.txt: initial
v2/version.txt: second
webroot/version.txt: first
Switch to v2:
v0/version.txt: initial
v1/version.txt: first
webroot/version.txt: second
Final:
v0/version.txt: initial
v1/version.txt: first
webroot/version.txt: second
Of course the above isn’t very convincing when it comes to atomicity guarantees. But I have a hard time coming up with a good way to verify that. So the man page will have to do for now. :-)
Closing word
When writing this short article and the Ruby demo script I realized an
unpleasant side-effect of using renameat2
.
The symlink approach clearly lends itself to the following workflow:
- check out version X8
- create temp symlink
- rename temp symlink to
webroot
That allows easy error recovery and inspection of the currently deployed version.
However, with renameat2
– which switches two directories – a similar
workflow isn’t straightforward. Because we lose the level of indirection…
and thus we also lose the name of the currently deployed version.
Plus, error recovery (in case the process dies after renameat2
but before
rename
) is tricky. Basically we can’t really use the directory names
as version names anymore.
So better functionality9 also comes with some unexpected downside.
-
Because from C it’d be rather boring. ;) And there’s already a prior art. ↩
-
This is, btw, how
rsync
does it by default (unless you use--inplace
) and it’s one of the reasons why it generally rocks. ↩ -
I wonder if anything but this is to be expected. ;) ↩
-
My employer might even use it as an interview question… because it’s used for switching between package versions on our cluster management system. Not that many people are aware of that in 2021. ;) ↩
-
If you don’t know man7.org and/or Michael Kerrisk’s – rather magnificent – book “The Linux Programming Interface”, you are missing out. ↩
-
Partial because you have to figure out path to libc, and the constant isn’t guaranteed either, I think. ↩
-
Possibly even: checkout version X to temp directory, then (on success) atomically rename to a stable name. ↩
-
Btw,
renameat2
can do much more than just atomic swap of two directories; see the man page. ↩