js' blog

Interesting GCC optimization on SPARC64
Created: 19.08.2017 10:17 UTC

Today, I was looking into the following linker warning I got on OpenBSD/SPARC64:

/usr/bin/ld: warning: creating a DT_TEXTREL in a shared object.

This was in the runtime .so, so it was pretty obvious that this was coming from the hand-written lookup. readelf -r lookup-asm/lookup-asm.lib.o quickly revealed:

Relocation section '.rela.text' at offset 0x568 contains 6 entries:
  Offset          Info           Type           Sym. Value    Sym. Name + Addend
00000000002c  000b00000029 R_SPARC_WDISP19   0000000000000000 objc_method_not_found + 0
000000000068  000c00000029 R_SPARC_WDISP19   0000000000000000 objc_method_not_found_ + 0
0000000000b4  000d00000011 R_SPARC_PC22      0000000000000000 _GLOBAL_OFFSET_TABLE_ + fffffffffffffffc
0000000000bc  000d00000010 R_SPARC_PC10      0000000000000000 _GLOBAL_OFFSET_TABLE_ + 4
0000000000c4  00060000000f R_SPARC_GOT22     00000000000000d4 nil_method + 0
0000000000c8  00060000000d R_SPARC_GOT10     00000000000000d4 nil_method + 0

Hm, yeah, that objc_method_not_found doesn't look right. Let's see what we have in the code (this is inside an assembler macro and \not_found is replaced with objc_method_not_found):

	be,pn	%xcc, \not_found

Ah, of course. While the linker handles a call fine in PIC mode, it apparently chokes on the branch if equal. Makes sense, considering the linker actually replaces the call with different instructions. And also explains why this warning did not appear on NetBSD/SPARC64: OpenBSD seems to have an older version of the linker due to avoiding GPL3 code.

Ok, so now we know what the solution is: Branch to a small stub instead that just does a call that then gets replaced by the linker. Let's see how GCC emits this if we want to have a tail call:

stub_func:
	or	%o7, %g0, %g1
	call	objc_not_found, 0
	 or	%g1, %g0, %o7

This code confused me at first: Why is it setting %g1 to %o7? %g1 is volatile and the ABI doesn't specify anything about it at all. Also doesn't specify it in the PIC section. Why does it need to be set for the call? Is the ABI slightly different on OpenBSD? And why immediately set %o7 back to %g1? (Note: SPARC64 uses branch delay slots, so the instruction after the branch is executed before the branch is taken.)

This didn't seem to make any sense, until I found this in the ISA documentation:

The CALL instruction writes the contents of the PC, which points to the CALL instruction itself, into r[15] (out register 7) and then causes a delayed transfer of control to a PC-relative effective address. The value written into r[15] is visible to the instruction in the delay slot.

Aha! That's interesting. While it only says the value is visible in the branch delay slot, this also means that %o7 has already been set when we execute the branch delay slot. GCC exploits this in a nice way: It stores in %g1 the old value of %o7 and restores it during the call, meaning that the called function will get the same return address we had. This is why the volatile register %g1 can be used, as we don't even care about it anymore when the called function is executed, and why it sets it and immediately sets it back.

Anyway, since it just cost me 10 minutes to figure out why this code works and does what it should, I thought it might be worth blogging about this to save someone else those 10 minutes (as a quick search does not turn this up - hopefully it will now).