Rustc produces bad binaries on Fedora 14, Centos 6.6, 32 bit #27543

mcpherrinm · 2015-08-05T19:05:05Z

I think it's because of the TLS setup not working properly

Here's a (sorry, screenshot) of vimdiff of readelf of a working and non-working binary https://i.imgur.com/zfdDYss.png -- Both compiled with the same rustc binary, but different host OSes.

I believe this is related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46770

mcpherrinm · 2015-08-05T19:05:31Z

I believe this is the same issue encountered in https://users.rust-lang.org/t/illegal-instruction-when-i-run-main-rs/2199

brson · 2015-08-05T19:06:25Z

cc @alexcrichton

mcpherrinm · 2015-08-05T19:07:03Z

Non-working was linked with gcc version 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) / GNU ld version 2.20.51.0.7-8.fc14 20100318
Working binary was linked with gcc version 4.9.2 20141101 (Red Hat 4.9.2-1) (GCC)

nagisa · 2015-08-05T19:08:56Z

Notably the left side(.init_array/.fini_array) is working one and right side (.ctors/.dtors) is non-working.

mcpherrinm · 2015-08-05T22:34:22Z

http://mcpherrin.ca/tmp/badrust.bin

Here is a sample bad binary, which is "fn main() println!("Hello, World!")" compiled on Centos 6.6 32 bit in Virtualbox, with a Rust 1.2 stable candidate.

nagisa · 2015-08-07T18:29:09Z

A debug, non-optimised binary: http://mcpherrin.ca/tmp/badrust-dbg.bin

EDIT: my investigation so far shows that jemalloc is able to setup thread local storage for its own use just fine, which suggests the issue is on our side.

mcpherrinm · 2015-08-07T18:29:41Z

Here's a similar sample, but with debug symbols
"rustc -g -C opt-level=0 test.rs -o badrust-dbg.bin"

http://mcpherrin.ca/tmp/badrust-dbg.bin

nagisa · 2015-08-08T22:38:14Z

These initialiser sections, apparently, are completely irrelevant; we initialise all our thread locals lazily on first use (and initialisers are stored in .tdata even on old systems). Interestingly we don’t seem to use pthread for it either (which could be the approach to take on older systems).

Over 1.5 days of debugging I managed to observe symptoms mostly indicative of invalid memory reads (e.g. SIGSEGVs after null pointer deref inside Cell::get) along with the issue reported originally. Basically it seems that we read uninitialised memory on first use (at least on main thread) and only on older 32 bit systems we manage to hit a corner case where dtor_running happens to have the same representation as true (LSB set to 1) – this is what makes the panic happen. ~~Even on my recent system dtor_running (boolean!) seems to be a pretty random byte value (e.g. 64 or 116) which “just happens” to have its LSB set to 0.~~

alexcrichton · 2015-08-10T20:57:42Z

@nagisa could you detail a bit more how you reproduced this?

Were you also using Centos 6.6 like @mcpherrinm?
Did you have the same results where a newer linker worked whereas an older one did not?
Do you know if the OS has any open bugs about not initializing TLS data?

This may just be a case where we need to not use ELF-based TLS on older systems, but detecting that will not be easy unfortunately.

nagisa · 2015-08-10T22:10:54Z

@alexcrichton I was using Fedora 14; since the package repositories are not working anymore, I didn’t bother updating anything, that includes the ld or ld-linux.so.2.

In cases #27598 wasn’t hit, I could reliably reproduce the same issue and backtrace as reported in the internals thread. I did try using ld.gold (-C link-args=-fuse-ld=gold) with same results.

Interestingly, the stage1 rustc compiler built on the same fedora machine works just fine. Hand-crafted snippets of code that do not use std implementation of thread locals also seem to work fine and correctly.

So it is most likely either something we do in start_lang implementation before setting thread info or the std implementation of TLS is wrong.

alexcrichton · 2015-08-11T06:09:35Z

Oh it looks like I've actually already investigated this exact same bug before. As discovered in #20440 we depend on binutils 2.22 for an apparent bugfix with TLS, so I'm going to close this as basically a linker bug we can't do much about

brson added I-nominated T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Aug 6, 2015

alexcrichton closed this as completed Aug 11, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rustc produces bad binaries on Fedora 14, Centos 6.6, 32 bit #27543

Rustc produces bad binaries on Fedora 14, Centos 6.6, 32 bit #27543

mcpherrinm commented Aug 5, 2015

mcpherrinm commented Aug 5, 2015

brson commented Aug 5, 2015

mcpherrinm commented Aug 5, 2015

nagisa commented Aug 5, 2015

mcpherrinm commented Aug 5, 2015

nagisa commented Aug 7, 2015

mcpherrinm commented Aug 7, 2015

nagisa commented Aug 8, 2015

alexcrichton commented Aug 10, 2015

nagisa commented Aug 10, 2015

alexcrichton commented Aug 11, 2015

Rustc produces bad binaries on Fedora 14, Centos 6.6, 32 bit #27543

Rustc produces bad binaries on Fedora 14, Centos 6.6, 32 bit #27543

Comments

mcpherrinm commented Aug 5, 2015

mcpherrinm commented Aug 5, 2015

brson commented Aug 5, 2015

mcpherrinm commented Aug 5, 2015

nagisa commented Aug 5, 2015

mcpherrinm commented Aug 5, 2015

nagisa commented Aug 7, 2015

mcpherrinm commented Aug 7, 2015

nagisa commented Aug 8, 2015

alexcrichton commented Aug 10, 2015

nagisa commented Aug 10, 2015

alexcrichton commented Aug 11, 2015