How to call renameat2 syscall in Ruby


Problem statement

If your knowledge of Linux internals isn’t 100% current, you might be surprised (like I was a while back) that there’s now an easy way for the age-old problem of atomically renaming (swapping) directories. It’s called renameat2.

Trouble is that even though renameat2 syscall isn’t exactly new (it was introduced in Linux kernel 3.15 and added to glibc 2.28), there isn’t a commandline-accessible support for it (yet).

In this post I’ll explore what it takes to make it accessible in Ruby1.

Background

But first a bit of a background: What is one of the many problems that renameat2 solves?

Imagine you have a directory tree. For example a website root. And your website is really popular. Constantly accessed by hundreds of visitors.

How do you perform an update of a bunch of articles without the poor visitors getting partial files back? Especially if the HTML and CSS/JS assets go together.

In more technical terms: how do you atomically update a directory tree?

For files themselves it’s quite easy. The rename syscall2rename(old, new) – will atomically replace new path with old if new already exists3.

But for directories rename doesn’t work, unless the new path is empty.

And thus the tried and true trick is to add a level of indirection4 and instead of using a directory as your (website) directory root, you use a symlink to some directory. Then, the atomic switch can be performed by renaming a temp symlink pointing to the new directory.

Allow me to demonstrate:

#!/usr/bin/env ruby

# Cleanup at exit
trap('EXIT') do
  %w[webroot v1 v2 webroot].map do |e|
	begin
	  File.lstat(e).directory? ? Dir.rmdir(e) : File.unlink(e)
	rescue Object
	end
  end
end

# Setup
Dir.mkdir('v1')
Dir.mkdir('v2')
File.symlink('v1', 'webroot')

# We're at v1:
File.readlink('webroot') # => "v1"
puts "Pre:"
system('ls -l')
puts

# Switch to v2:
File.symlink('v2', 'tmp')
File.rename('tmp', 'webroot')

# We're at v2:
File.readlink('webroot') # => "v2"
puts "Post:"
system('ls -l')

which executes down to:

$ ruby rename.rb 
Pre:
total 12
-rw-r--r-- 1 wejn wejn  499 Dec 10 19:14 rename.rb
drwxr-xr-x 2 wejn wejn 4096 Dec 10 19:14 v1
drwxr-xr-x 2 wejn wejn 4096 Dec 10 19:14 v2
lrwxrwxrwx 1 wejn wejn    2 Dec 10 19:14 webroot -> v1

Post:
total 12
-rw-r--r-- 1 wejn wejn  499 Dec 10 19:14 rename.rb
drwxr-xr-x 2 wejn wejn 4096 Dec 10 19:14 v1
drwxr-xr-x 2 wejn wejn 4096 Dec 10 19:14 v2
lrwxrwxrwx 1 wejn wejn    2 Dec 10 19:14 webroot -> v2

But I find this indirection unwelcome, even though it’s used far and wide5.

And that’s where renameat2 comes into play. With the RENAME_EXCHANGE flag it allows to swap two directories atomically.

Solution

Looking at the man page6 renameat2 takes the form:

int renameat2(int olddirfd, const char *oldpath,
    int newdirfd, const char *newpath, unsigned int flags);

And yes, we can quite easily do the syscall dance in Ruby:

d = Dir.open('/path/to/base')
NR_renameat2 = 316 # from /usr/include/x86_64-linux-gnu/bits/syscall.h
RENAME_EXCHANGE = 1<<1 # from /usr/include/linux/fs.h
syscall(NR_renameat2, d.fileno, 'a', d.fileno, 'b', RENAME_EXCHANGE)

But if this gets shipped to production, there’s a painful surprise waiting for us down the road (e.g. when we switch between platforms) as the syscall numbers aren’t stable.

Fortunately Ruby 2.5 comes with a convenient wrapper to call native functions (libffi).

Which brings me to partial7 solution for the renameat2 problem in Ruby:

#!/usr/bin/env ruby

require 'fiddle'

# Cleanup at exit
require 'fileutils'
trap('EXIT') do
  %w[v0 v1 v2 webroot].map do |e|
	begin
	  FileUtils.rm_rf(e)
	rescue Object
	end
  end
end

# This is where the magic is defined
libc = Fiddle.dlopen('/lib/x86_64-linux-gnu/libc.so.6')
# TODO(wejn): Figure out how not to hardcode libc path...
renameat2 = Fiddle::Function.new(
  libc['renameat2'],
  [
    Fiddle::TYPE_INT,   # olddirfd
    Fiddle::TYPE_VOIDP, # oldpath
    Fiddle::TYPE_INT,   # newdirfd
    Fiddle::TYPE_VOIDP, # newpath
    Fiddle::TYPE_INT,   # flags
  ],
  Fiddle::TYPE_INT)
RENAME_EXCHANGE = 1<<1 # from /usr/include/linux/fs.h, less likely to change
# TODO(wejn): Figure out how not to hardcode the constant...

# Setup
%w[v1 v2 webroot].each { |d| Dir.mkdir(d) }
File.write('webroot/version.txt', "initial")
File.write('v1/version.txt', "first")
File.write('v2/version.txt', "second")

d = Dir.open('.') # technically a path to base

show = lambda do |label|
  puts "#{label}:"
  Dir["**/version.txt"].sort.each do |e|
    puts "#{e}: #{File.read(e)}"
  end
  puts
end

# Initial state:
show["Initial"]

# Upgrade to v1:
renameat2.call(d.fileno, 'v1', d.fileno, 'webroot', RENAME_EXCHANGE)
File.rename('v1', 'v0') # careful: v1 and webroot were switched...
show["Switch to v1"]

# Upgrade to v2:
renameat2.call(d.fileno, 'v2', d.fileno, 'webroot', RENAME_EXCHANGE)
File.rename('v2', 'v1') # careful: v1 and webroot were switched...
show["Switch to v2"]

# Final state:
show["Final"]

which executed ends up looking like this:

$ ruby renameat2.rb 
Initial:
v1/version.txt: first
v2/version.txt: second
webroot/version.txt: initial

Switch to v1:
v0/version.txt: initial
v2/version.txt: second
webroot/version.txt: first

Switch to v2:
v0/version.txt: initial
v1/version.txt: first
webroot/version.txt: second

Final:
v0/version.txt: initial
v1/version.txt: first
webroot/version.txt: second

Of course the above isn’t very convincing when it comes to atomicity guarantees. But I have a hard time coming up with a good way to verify that. So the man page will have to do for now. :-)

Closing word

When writing this short article and the Ruby demo script I realized an unpleasant side-effect of using renameat2.

The symlink approach clearly lends itself to the following workflow:

  1. check out version X8
  2. create temp symlink
  3. rename temp symlink to webroot

That allows easy error recovery and inspection of the currently deployed version.

However, with renameat2 – which switches two directories – a similar workflow isn’t straightforward. Because we lose the level of indirection… and thus we also lose the name of the currently deployed version.

Plus, error recovery (in case the process dies after renameat2 but before rename) is tricky. Basically we can’t really use the directory names as version names anymore.

So better functionality9 also comes with some unexpected downside.

  1. Because from C it’d be rather boring. ;) And there’s already a prior art.

  2. https://www.man7.org/linux/man-pages/man2/rename.2.html

  3. This is, btw, how rsync does it by default (unless you use --inplace) and it’s one of the reasons why it generally rocks.

  4. I wonder if anything but this is to be expected. ;)

  5. My employer might even use it as an interview question… because it’s used for switching between package versions on our cluster management system. Not that many people are aware of that in 2021. ;)

  6. If you don’t know man7.org and/or Michael Kerrisk’s – rather magnificent – book “The Linux Programming Interface”, you are missing out.

  7. Partial because you have to figure out path to libc, and the constant isn’t guaranteed either, I think.

  8. Possibly even: checkout version X to temp directory, then (on success) atomically rename to a stable name.

  9. Btw, renameat2 can do much more than just atomic swap of two directories; see the man page.