Skip to content

Rustc produces bad binaries on Fedora 14, Centos 6.6, 32 bit #27543

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mcpherrinm opened this issue Aug 5, 2015 · 11 comments
Closed

Rustc produces bad binaries on Fedora 14, Centos 6.6, 32 bit #27543

mcpherrinm opened this issue Aug 5, 2015 · 11 comments
Labels
T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@mcpherrinm
Copy link
Contributor

I think it's because of the TLS setup not working properly

Here's a (sorry, screenshot) of vimdiff of readelf of a working and non-working binary https://i.imgur.com/zfdDYss.png -- Both compiled with the same rustc binary, but different host OSes.

I believe this is related to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46770

@mcpherrinm
Copy link
Contributor Author

I believe this is the same issue encountered in https://users.rust-lang.org/t/illegal-instruction-when-i-run-main-rs/2199

@brson
Copy link
Contributor

brson commented Aug 5, 2015

cc @alexcrichton

@mcpherrinm
Copy link
Contributor Author

Non-working was linked with gcc version 4.5.1 20100924 (Red Hat 4.5.1-4) (GCC) / GNU ld version 2.20.51.0.7-8.fc14 20100318
Working binary was linked with gcc version 4.9.2 20141101 (Red Hat 4.9.2-1) (GCC)

@nagisa
Copy link
Member

nagisa commented Aug 5, 2015

Notably the left side(.init_array/.fini_array) is working one and right side (.ctors/.dtors) is non-working.

@mcpherrinm
Copy link
Contributor Author

http://mcpherrin.ca/tmp/badrust.bin

Here is a sample bad binary, which is "fn main() println!("Hello, World!")" compiled on Centos 6.6 32 bit in Virtualbox, with a Rust 1.2 stable candidate.

@brson brson added I-nominated T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Aug 6, 2015
@nagisa
Copy link
Member

nagisa commented Aug 7, 2015

A debug, non-optimised binary: http://mcpherrin.ca/tmp/badrust-dbg.bin

EDIT: my investigation so far shows that jemalloc is able to setup thread local storage for its own use just fine, which suggests the issue is on our side.

@mcpherrinm
Copy link
Contributor Author

Here's a similar sample, but with debug symbols
"rustc -g -C opt-level=0 test.rs -o badrust-dbg.bin"

http://mcpherrin.ca/tmp/badrust-dbg.bin

@nagisa
Copy link
Member

nagisa commented Aug 8, 2015

These initialiser sections, apparently, are completely irrelevant; we initialise all our thread locals lazily on first use (and initialisers are stored in .tdata even on old systems). Interestingly we don’t seem to use pthread for it either (which could be the approach to take on older systems).

Over 1.5 days of debugging I managed to observe symptoms mostly indicative of invalid memory reads (e.g. SIGSEGVs after null pointer deref inside Cell::get) along with the issue reported originally. Basically it seems that we read uninitialised memory on first use (at least on main thread) and only on older 32 bit systems we manage to hit a corner case where dtor_running happens to have the same representation as true (LSB set to 1) – this is what makes the panic happen. Even on my recent system dtor_running (boolean!) seems to be a pretty random byte value (e.g. 64 or 116) which “just happens” to have its LSB set to 0.

@alexcrichton
Copy link
Member

@nagisa could you detail a bit more how you reproduced this?

  • Were you also using Centos 6.6 like @mcpherrinm?
  • Did you have the same results where a newer linker worked whereas an older one did not?
  • Do you know if the OS has any open bugs about not initializing TLS data?

This may just be a case where we need to not use ELF-based TLS on older systems, but detecting that will not be easy unfortunately.

@nagisa
Copy link
Member

nagisa commented Aug 10, 2015

@alexcrichton I was using Fedora 14; since the package repositories are not working anymore, I didn’t bother updating anything, that includes the ld or ld-linux.so.2.

In cases #27598 wasn’t hit, I could reliably reproduce the same issue and backtrace as reported in the internals thread. I did try using ld.gold (-C link-args=-fuse-ld=gold) with same results.

Interestingly, the stage1 rustc compiler built on the same fedora machine works just fine. Hand-crafted snippets of code that do not use std implementation of thread locals also seem to work fine and correctly.

So it is most likely either something we do in start_lang implementation before setting thread info or the std implementation of TLS is wrong.

@alexcrichton
Copy link
Member

Oh it looks like I've actually already investigated this exact same bug before. As discovered in #20440 we depend on binutils 2.22 for an apparent bugfix with TLS, so I'm going to close this as basically a linker bug we can't do much about

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants