Parallelize batch runs with style


Problem statement

In Parallelize YouTube downloads I introduced my favorite trick for parallel execution with xargs.

And while seq 1 50 | xargs -P25 -I{} echo {} is definitely a worthy pattern1, the intermixed output and no clear indication of status is a bummer.

So given input file ids:

$ cat ids 
cC5pPsiXO7s
gVwFSu2WDv4
krrqydtneO0
blablablabl

I’d like to get to something as nice as this:

pretty batch run output

In other words, a command runner that:

  1. shows clear success/failure status for a batch job
  2. retries on failures (a few times)
  3. on success doesn’t output stdout/stderr, just state
  4. on final failure outputs error state and stdout/stderr (for debugging)

And as an optional extra:

  1. Allows specifying “verbose” flag for more verbosity.

Solution

How was I gonna do it? Ruby, naturally2.

The batch-run.sh itself is something rather straightforward; it actually doesn’t have to be a script:

$ xargs -I{} -P25 ./upto-n-times.rb 4 echo {} < ids

The passing of “verbose” flag can be env variable, for simplicity. So either export VERBOSE=1 or downright:

$ seq 1 1 | VERBOSE=1 xargs -I{} -P25 ./upto-n-times.rb 4 echo {}
["echo", "1"]: running (1 try)...
["echo", "1"]: success (first try).
Try 1:
1

So obviously the main course is the upto-n-times.rb; in other words, the runner with retry logic, nice reporting, etc.

My take on that (and a hardly surprising one, I’d say):

#!/usr/bin/env ruby

require 'open3'

# Runs given command, and on failures retries up to N times

if ARGV.size < 2
  STDERR.puts "Usage: #{File.basename($0)} <retries> <command>+"
  exit 111
end

num_retries = ARGV.first.to_i

if num_retries.zero?
  STDERR.puts "num_retries (first argv) must be a number > 0"
  exit 112
end

command = ARGV[1..-1]

output = []
status = nil

num_retries.times do |i|
  puts "#{command.inspect}: running (#{i+1} try)..." if ENV['VERBOSE']
  out, status = Open3.capture2e(*command)
  output << out
  break if status.success?
end

if status.success?
  puts "\e[1;32m#{command.inspect}: success (" +
    (output.size < 2 ? "first try" : "#{output.size} tries") + ").\e[0m"
else
  puts "\e[1;31m#{command.inspect}: failed (with #{num_retries} retries):\e[0m"
end

if !status.success? || ENV['VERBOSE']
  output.each_with_index do |o, i|
    puts "Try #{i+1}:"
    puts o
  end
end

exit status.exitstatus

and maybe slightly polished batch-run.sh, that checks the input parameter and passes on the $VERBOSE flag (if set):

#!/bin/bash

# Runs a given shell script for all of `ids`.
# (25 at a time, with 4 retries on failure)

if [ $# -ne 1 -o ! -x "$1" ]; then
  echo "Usage: $0 <script>" >&2
  exit 1
fi

# if VERBOSE is set, propagate
if [ ! -z "${VERBOSE+x}" ]; then
  export VERBOSE=1
fi

exec xargs -I{} -P25 ./upto-n-times.rb 4 "$1" {} < ids

Demo time

In the end, this works rather well. For a script that intermittently fails:

#!/usr/bin/env ruby

i = ARGV.first.to_i

if i.odd?
  if (rand(10)%3).zero?
    puts "even, ok"
    exit 0
  else
    puts "even, fail"
    exit 1
  end
else
  puts "even, ok"
  exit 0
end

the retry logic works as it should:

$ seq 1 10 > ids
$ ./batch-run.sh ./fail-on-odd.rb 
["./fail-on-odd.rb", "2"]: success (first try).
["./fail-on-odd.rb", "4"]: success (first try).
["./fail-on-odd.rb", "8"]: success (first try).
["./fail-on-odd.rb", "6"]: success (first try).
["./fail-on-odd.rb", "9"]: success (first try).
["./fail-on-odd.rb", "10"]: success (first try).
["./fail-on-odd.rb", "7"]: success (3 tries).
["./fail-on-odd.rb", "1"]: success (4 tries).
["./fail-on-odd.rb", "5"]: failed (with 4 retries):
Try 1:
even, fail
Try 2:
even, fail
Try 3:
even, fail
Try 4:
even, fail
["./fail-on-odd.rb", "3"]: success (4 tries).

and verbose output also3:

$ seq 1 5 > ids
$ VERBOSE=1 ./batch-run.sh ./fail-on-odd.rb 
["./fail-on-odd.rb", "5"]: running (1 try)...
["./fail-on-odd.rb", "2"]: running (1 try)...
["./fail-on-odd.rb", "4"]: running (1 try)...
["./fail-on-odd.rb", "3"]: running (1 try)...
["./fail-on-odd.rb", "1"]: running (1 try)...
["./fail-on-odd.rb", "2"]: success (first try).
Try 1:
even, ok
["./fail-on-odd.rb", "4"]: success (first try).
Try 1:
even, ok
["./fail-on-odd.rb", "5"]: running (2 try)...
["./fail-on-odd.rb", "1"]: running (2 try)...
["./fail-on-odd.rb", "3"]: running (2 try)...
["./fail-on-odd.rb", "5"]: running (3 try)...
["./fail-on-odd.rb", "1"]: success (2 tries).
Try 1:
even, fail
Try 2:
even, ok
["./fail-on-odd.rb", "3"]: running (3 try)...
["./fail-on-odd.rb", "5"]: success (3 tries).
Try 1:
even, fail
Try 2:
even, fail
Try 3:
even, ok
["./fail-on-odd.rb", "3"]: running (4 try)...
["./fail-on-odd.rb", "3"]: failed (with 4 retries):
Try 1:
even, fail
Try 2:
even, fail
Try 3:
even, fail
Try 4:
even, fail

Closing words

This is a very simple thing4, but one that I like for the demonstration of composability on Unix.

Because often various “problems” can be solved by divide and conquer strategy – splitting the problems into manageable chunks and composing them back together.

What’s your favorite quickie, dear reader?

  1. And yes, I’m aware of GNU parallel, in all its might and scary complexity.

  2. Yes, nowadays, surfing the up slope of AI hype you’d almost expect me to join the chorus of “prompt ChatGPT to write that for me”. This is not that kind of blog.

  3. You just need to imagine the nice red/green there. Because text>screenshot.

  4. Yup, the write-up took longer than the script.