Nerdsniped: clang++-13 infinite loop bug


Problem statement

A friend of mine at work mentioned the following c++ code:

// x.cc
#include <iostream>

int main() {
  while(1)
    ;
}

void unreachable() {
  std::cout << "Hello" << std::endl;
}

and how when compiled as clang++ -O1 -Wall -o x x.cc it leads to an unexpected result (stdout + termination):

$ clang++ -O1 -Wall -o x x.cc
$ ./x
Hello
$ 

While with g++ it works as one would expect (infinite loop):

$ g++ -O1 -Wall -o x x.cc
$ ulimit -t 1
# ^^ die miserably after 1 second of cpu time
$ ./x
Killed
$ echo $?
137

In this quickie I’ll go over what (I think) is going on.

And yes, I’m aware now that an infinite loop without side-effects is an undefined behavior. I’m not interested in that; I’m more interested in the low-level detail of why clang fails so miserably1.

Recon

First question – is it consistent across versions?

Whipping up test.sh:

#!/bin/bash

for i in g++ clang++-7 clang++-11 clang++-13; do
  echo "+ Compile using: $i"
  $i -o $i x.cc -O1 -Wall
  echo "+ Run:"
  bash -c 'ulimit -t 1; ./'$i
  echo
done

says no:

$ bash test.sh 
+ Compile using: g++
+ Run:
bash: line 1: 24659 Killed                  ./g++

+ Compile using: clang++-7
+ Run:
bash: line 1: 24664 Killed                  ./clang++-7

+ Compile using: clang++-11
+ Run:
bash: line 1: 24676 Killed                  ./clang++-11

+ Compile using: clang++-13
+ Run:
Hello

So it’s only clang++-13 (13.0.1-6~deb10u4) that’s affected2.

Second question – why is it broken?

First inkling comes from nm (listing symbols from an object file)3:

$ cat analyze.sh
for i in g++ clang++-11 clang++-13; do
  $i -c -o $i.o x.cc -O1 -Wall
  objdump -j .text -t $i.o
done
$ bash analyze.sh 

g++.o:     file format elf64-x86-64

SYMBOL TABLE:
0000000000000000 l    d  .text  0000000000000000 .text
000000000000007a l     F .text  000000000000002f _GLOBAL__sub_I_main
0000000000000000 g     F .text  0000000000000002 main
0000000000000002 g     F .text  0000000000000078 _Z11unreachablev


clang++-7.o:     file format elf64-x86-64

SYMBOL TABLE:
0000000000000000 l    d  .text  0000000000000000 .text
0000000000000010 g     F .text  000000000000001e _Z11unreachablev
0000000000000000 g     F .text  0000000000000002 main


clang++-11.o:     file format elf64-x86-64

SYMBOL TABLE:
0000000000000000 l    d  .text  0000000000000000 .text
0000000000000010 g     F .text  000000000000001f _Z11unreachablev
0000000000000000 g     F .text  0000000000000002 main


clang++-13.o:     file format elf64-x86-64

SYMBOL TABLE:
0000000000000000 l    d  .text  0000000000000000 .text
0000000000000000 g     F .text  0000000000000000 main
0000000000000000 g     F .text  0000000000000069 _Z11unreachablev

Hmm, interesting. While g++ and clang++-11 has the main and unreachable() symbols at different offsets, clang++-13 puts them at the same offset, and even assigns main zero size. Hmm.

I would speculate that relocation finishes the job.

Third question – how about non-empty function?

I discussed this issue with a colleague at work (who used to write compilers) and he was of the opinion that it could be caused because the optimizer doesn’t have any rigid notion of function – and simply eliminates dead code. And that it could even be a sign of decent compiler architecture.

That’s fair. So, let’s make sure the main() isn’t empty:

// x.cc
#include <iostream>

int main() {
  __asm__("nop");
  while(1)
    ;
  __asm__("nop");
}

void unreachable() {
  std::cout << "Hello" << std::endl;
}

And plot thickens: disassembling main from that shows the nop followed by what seems like trash4:

$ objdump -j .text -d clang++-13 | ruby -pe 'next unless /<main>/../^$/'
00000000004011d0 <main>:
  4011d0:       90                      nop
  4011d1:       66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  4011d8:       00 00 00 
  4011db:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

That trash (alignment) isn’t actually part of the function:

$ clang++-13 -O1 -Wall -c -o clang++-13.o x.cc 
$ objdump -j .text -t clang++-13.o

clang++-13.o:     file format elf64-x86-64

SYMBOL TABLE:
0000000000000000 l    d  .text  0000000000000000 .text
0000000000000000 g     F .text  0000000000000001 main
0000000000000010 g     F .text  0000000000000069 _Z11unreachablev

because objdump says that main is one byte long.

But you can clearly see what the next symbol in the *.o? Yes, our friend unreachable.

So even with a non-empty function, there isn’t any attempt at return… basically anything after the infinite loop gets removed.

Which is sort of fair, as all the compilers do it:

$ cat test.sh 
for i in g++ clang++-11 clang++-13; do
  echo "+ Compile using: $i"
  $i -c -o $i.o x.cc -O1 -Wall
  echo "++ Symbols:"
  objdump -j .text -t $i.o | grep -e ^0
  echo "++ Disasm:"
  objdump -j .text -d $i.o | \
    ruby -pe 'next unless /^[0-9a-h]+ <main>/../^$/'
done
$ bash test.sh 
+ Compile using: g++
++ Symbols:
0000000000000000 l    d  .text  0000000000000000 .text
000000000000007b l     F .text  000000000000002f _GLOBAL__sub_I_main
0000000000000000 g     F .text  0000000000000003 main
0000000000000003 g     F .text  0000000000000078 _Z11unreachablev
++ Disasm:
0000000000000000 <main>:
   0:   90                      nop
   1:   eb fe                   jmp    1 <main+0x1>

+ Compile using: clang++-11
++ Symbols:
0000000000000000 l    d  .text  0000000000000000 .text
0000000000000020 g     F .text  000000000000001f _Z11unreachablev
0000000000000000 g     F .text  0000000000000012 main
++ Disasm:
0000000000000000 <main>:
   0:   90                      nop
   1:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
   8:   00 00 00 
   b:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
  10:   eb fe                   jmp    10 <main+0x10>
  12:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  19:   00 00 00 
  1c:   0f 1f 40 00             nopl   0x0(%rax)

+ Compile using: clang++-13
++ Symbols:
0000000000000000 l    d  .text  0000000000000000 .text
0000000000000000 g     F .text  0000000000000001 main
0000000000000010 g     F .text  0000000000000069 _Z11unreachablev
++ Disasm:
0000000000000000 <main>:
   0:   90                      nop
   1:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
   8:   00 00 00 
   b:   0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)

Only in case of clang++-13 (and newer) the removal of the entire infinite loop leads to simply sledding into the symbol defined after the zero-length/incomplete main.

Closing words

This was just a quick run through how the clang weirdness comes about.

I’m sure someone with more patience (and knack for compilers) could figure out exactly what in clang source causes this. But I’m giving it a rest for now.

Btw, if you want a nice deep dive into these low-level topics, the Zero to main() blog post series by Interrupt folks is fantastic. Yes, it is embedded-systems oriented5… but it translates well to the regular machines.

  1. Just because something’s undefined behavior doesn’t mean the compiler has to be an idiot about it. Does it?

  2. I later checked Alpine clang version 15.0.7, and it also exhibits the same behavior.

  3. For subsequent adventures I’ve dropped clang++-7, as it’s similar to clang++-11.

  4. It’s just address alignment.

  5. Think ARM Cortex-M0+ and the like.