Orbit release build of glium takes a very long time in codegen passes #34166

jonas-schievink · 2016-06-08T21:17:51Z

Steps to reproduce:

clone glium and check out glium/glium@11a6b7b (most recent commit at the time of writing)
try a -Zorbit release build with the current nightly (2016-06-07):
- For me, this involved env RUSTFLAGS="-Zorbit" rustup run nightly cargo rustc --release -- -Ztime-passes
Observe compilation progress (very slowly but steadily, in old Rust tradition) until llvm module passes, which itself takes about 24 seconds on my machine, and then get stuck somewhere in LLVM codegen

cc @eddyb (I heard you like nothing more than MIR trans and LLVM bugs)

The text was updated successfully, but these errors were encountered:

jonas-schievink · 2016-06-08T21:31:33Z

Output of perf after a while:

 51,06%  rustc    librustc_llvm-8102e29f.so             [.] llvm::SpillPlacement::iterate                                                                            
   9,49%  rustc    librustc_llvm-8102e29f.so             [.] _ZN4llvm14BlockFrequencypLES0_@plt                                                                       
   9,28%  rustc    librustc_llvm-8102e29f.so             [.] llvm::BlockFrequency::operator+=                                                                         
   4,89%  rustc    librustc_llvm-8102e29f.so             [.] llvm::BlockFrequency::operator+                                                                          
   4,42%  rustc    librustc_llvm-8102e29f.so             [.] _ZNK4llvm14BlockFrequencyplES0_@plt                                                                      
   1,49%  rustc    librustc_llvm-8102e29f.so             [.] llvm::AssumptionCache::scanFunction

arielb1 · 2016-06-08T21:41:49Z

Could you -C llvm-args=-print-before-all?

jonas-schievink · 2016-06-08T21:52:49Z

@arielb1 That somehow managed to lock up rustc (using 0% cpu) before getting to codegen (or even optimizations). Fun!

jonas-schievink · 2016-06-08T22:04:56Z

Here's the last part of it:

https://gist.github.com/jonas-schievink/c2599dd9f9687c907773ddb05f381430

jonas-schievink · 2016-06-11T17:49:09Z

So it seems like there are multiple issues at play here. Let me try to sum up what @eddyb and me managed to dig up a few days ago:

gl-rs generates Rust code for loading OpenGL function pointers, and the generated code isn't quite optimal:
- in fact, it generates 1624 lines of code for initializing the Gl struct, each of which performs 2 function calls
- this can be seen in target/debug/build/glium-*/out/gl_bindings.rs when building manually
MIR trans generates a lot more LLVM values than old trans
- in fact, the bad function results in 13001 values for old trans, but in 64978 values (!) for MIR trans (all unoptimized)
- this is still fine, since the unoptimized MIR build works
- after optimizations, old trans uses 53594 values, while MIR trans uses 61714 - MIR trans uses less than before, so this point might be moot (⬅️this might also indicate that the LLVM inliner should take the caller's size into account when deciding whether to inline or not)
LLVM's greedy register allocator seems to have some bad performance characteristics
- in fact, the build spends over 5 minutes in there until I cancelled it (see the gist linked below)
- specifying a different allocator by using llc --regalloc=X makes the code compile successfully

~~I'm really not sure why this all works out without optimizations but fails with them, but it seems like the greedy register allocator is partially at fault here.~~ Of course it's the greedy allocator, it is used by default for an optimized build, while the "fast" allocator is used for debug builds.

For details, refer to this gist.

eddyb · 2016-06-11T18:02:13Z

cc @alexcrichton Any chance this was already found and fixed upstream?

alexcrichton · 2016-06-11T18:54:48Z

Hm doesn't look familiar to me, so it may or may not have been (sorry not much help on this one).

If codegen is generating so much code though it seems like this is something that should arguably be fixed in upstream at the same time the MIR regression is fixed...

eddyb · 2016-06-11T19:01:18Z

Found something https://llvm.org/bugs/show_bug.cgi?id=19217 - and an older one https://llvm.org/bugs/show_bug.cgi?id=11811.
None of these bugs seem to have had much progress on them.
The cleanup one would suggest a potential reason for why MIR trans displays the problem.

@eddyb

don't generate drop ladder steps for fields that don't need dropping cc @eddyb This should help with #34166

arielb1 · 2016-06-16T13:01:44Z

I blame the LLVM inliner. This can be fixed by moving metaloadfn to a #[inline(never)] function:

                #[inline(never)]
                fn do_metaloadfn(loadfn: &mut FnMut(&str) -> *const __gl_imports::raw::c_void,
                                 symbol: &str,
                                 symbols: &[&str])
                                 -> *const __gl_imports::raw::c_void
                {
                    let mut ptr = loadfn(symbol);
                    if ptr.is_null() {
                        for &sym in symbols {
                            ptr = loadfn(sym);
                            if !ptr.is_null() { break; }
                        }
                    }
                    ptr
                }
                let mut metaloadfn = |symbol: &str, symbols: &[&str]| {
                    do_metaloadfn(&mut loadfn, symbol, symbols)
                };

That still leaves MIR ~20% slower, so there may be still some issue, but if at least finishes.

arielb1 · 2016-06-16T14:58:13Z

The unoptimized "unit cell" we generate is

    bb8: {
        tmp34 = &mut var1;               // scope 2 at ../target/release/build/g
lium-ea08b897ed8f243c/out/gl_bindings.rs:4168:25: 4168:35
        tmp37 = const "glTexCoord3iv";   // scope 2 at ../target/release/build/g
lium-ea08b897ed8f243c/out/gl_bindings.rs:4168:36: 4168:51
        tmp36 = &(*tmp37);               // scope 2 at ../target/release/build/g
lium-ea08b897ed8f243c/out/gl_bindings.rs:4168:36: 4168:51
        tmp40 = promoted1620;            // scope 2 at ../target/release/build/g
lium-ea08b897ed8f243c/out/gl_bindings.rs:4168:53: 4168:56
        tmp39 = &(*tmp40);               // scope 2 at ../target/release/build/g
lium-ea08b897ed8f243c/out/gl_bindings.rs:4168:53: 4168:56
        tmp38 = tmp39 as &[&str] (Unsize); // scope 2 at ../target/release/build
/glium-ea08b897ed8f243c/out/gl_bindings.rs:4168:53: 4168:56
        tmp35 = (tmp36, tmp38);          // scope 2 at ../target/release/build/g
lium-ea08b897ed8f243c/out/gl_bindings.rs:4168:25: 4168:57
        tmp33 = <[closure@../target/release/build/glium-ea08b897ed8f243c/out/gl_bindings.rs:4161:38: 4163:18 loadfn:&mut F] as std::ops::FnMut<(&str, &[&str])>>::call_mut(tmp34, tmp35) -> [return: bb9, unwind: bb2]; // scope 2 at ../target/release/build/glium-ea08b897ed8f243c/out/gl_bindings.rs:4168:25: 4168:57
    }


    bb9: {
        tmp32 = FnPtr::new(tmp33) -> [return: bb10, unwind: bb2]; // scope 2 at ../target/release/build/glium-ea08b897ed8f243c/out/gl_bindings.rs:4168:14: 4168:58
    }

In LLVM, that becomes

  %95 = getelementptr inbounds %str_slice, %str_slice* %temp37, i32 0, i32 0
  store i8* getelementptr inbounds ([13 x i8], [13 x i8]* @str63338, i32 0, i32 0), i8** %95
  %96 = getelementptr inbounds %str_slice, %str_slice* %temp37, i32 0, i32 1
  store i64 13, i64* %96
  %97 = getelementptr inbounds %str_slice, %str_slice* %temp37, i32 0, i32 0
  %98 = load i8*, i8** %97
  %99 = getelementptr inbounds %str_slice, %str_slice* %temp37, i32 0, i32 1
  %100 = load i64, i64* %99
  store [0 x %str_slice]* @ref63335, [0 x %str_slice]** %temp40
  %101 = load [0 x %str_slice]*, [0 x %str_slice]** %temp40, !nonnull !0
  %102 = bitcast [0 x %str_slice]* %101 to %str_slice*
  %103 = getelementptr inbounds { %str_slice, { %str_slice*, i64 } }, { %str_slice, { %str_slice*, i64 } }* %temp35, i32 0, i32 0
  %104 = getelementptr inbounds %str_slice, %str_slice* %103, i32 0, i32 0
  store i8* %98, i8** %104
  %105 = getelementptr inbounds %str_slice, %str_slice* %103, i32 0, i32 1
  store i64 %100, i64* %105
  %106 = getelementptr inbounds { %str_slice, { %str_slice*, i64 } }, { %str_slice, { %str_slice*, i64 } }* %temp35, i32 0, i32 1
  %107 = getelementptr inbounds { %str_slice*, i64 }, { %str_slice*, i64 }* %106, i32 0, i32 0
  store %str_slice* %102, %str_slice** %107
  %108 = getelementptr inbounds { %str_slice*, i64 }, { %str_slice*, i64 }* %106, i32 0, i32 1
  store i64 0, i64* %108
  %109 = getelementptr inbounds { %str_slice, { %str_slice*, i64 } }, { %str_slice, { %str_slice*, i64 } }* %temp35, i32 0, i32 0
  %110 = getelementptr inbounds %str_slice, %str_slice* %109, i32 0, i32 0
  %111 = load i8*, i8** %110
  %112 = getelementptr inbounds %str_slice, %str_slice* %109, i32 0, i32 1
  %113 = load i64, i64* %112
  %114 = getelementptr inbounds { %str_slice, { %str_slice*, i64 } }, { %str_slice, { %str_slice*, i64 } }* %temp35, i32 0, i32 1
  %115 = getelementptr inbounds { %str_slice*, i64 }, { %str_slice*, i64 }* %114, i32 0, i32 0
  %116 = load %str_slice*, %str_slice** %115
  %117 = getelementptr inbounds { %str_slice*, i64 }, { %str_slice*, i64 }* %114, i32 0, i32 1
  %118 = load i64, i64* %117
  %119 = invoke i8* @"_ZN5glium2gl2Gl9load_with28_$u7b$$u7b$closure$u7d$$u7d$17h26487c07ccd16bb4E"(%closure.4* dereferenceable(8) %metaloadfn, i8* noalias nonnull readonly %111, i64 %113, %str_slice* noalias nonnull readonly %116, i64 %118)
          to label %bb9 unwind label %cleanup

bb9:                                              ; preds = %bb8
  invoke void @_ZN5glium2gl5FnPtr3new17h33e17555c3c378b0E(%"gl::FnPtr"* noalias nocapture sret dereferenceable(16) %tmp_ret3, i8* %119)
          to label %bb10 unwind label %cleanup

bb10:                                             ; preds = %bb9
  %120 = getelementptr inbounds %"gl::FnPtr", %"gl::FnPtr"* %tmp_ret3, i32 0, i32 0
  %121 = getelementptr inbounds %"gl::FnPtr", %"gl::FnPtr"* %tmp_ret3, i32 0, i32 1
  %122 = load i8*, i8** %120
  %123 = load i8, i8* %121, !range !14
  %124 = trunc i8 %123 to i1

Very ugly. Pair-passing style + lack of DPS means we have to spill out lots of things.

NRVO should fix this.

MagaTailor · 2016-07-04T15:40:27Z

I commented in #25478, as the compilation is able to pull through. However it's probably related as I use -Zorbit by default.

eddyb · 2016-07-12T19:08:54Z

Added to the milestone so it's tracked until it gets fixed.

nikomatsakis · 2016-07-26T16:52:18Z

@arielb1 can you elaborate a bit on your comment. For example:

The unoptimized "unit cell" we generate is ...[MIR]

What do you mean by "unit cell"? What is the Rust code from which this is generated?

Pair-passing style + lack of DPS means we have to spill out lots of things.

Same question :) what is the rust code? What is "pair-passing style"?

Let me take my best guess :). I think by "pair passing style" you are referring to the rust-call convention, where we are tupling up the arguments? Certainly I see that in your snippet, e.g. in the creation of tmp35 = (tmp36, tmp38);. But I'm not clear on how DPS would help in this case!

nikomatsakis · 2016-07-27T21:52:20Z

Is this in any way platform specific?

jonas-schievink · 2016-07-27T22:19:29Z

This happened on x64 Arch Linux, but I can't reproduce it anymore:

  time: 1.154; rss: 290MB   llvm function passes [0]
  time: 23.388; rss: 306MB  llvm module passes [0]
  time: 10.041; rss: 305MB  codegen passes [0]
  time: 0.004; rss: 306MB   codegen passes [0]

eddyb · 2016-07-28T16:22:49Z

@jonas-schievink ~~Hangs~~ Takes ages in release + -Z orbit, both on glium master and the original revision, for me.

For master, i.e. glium/glium@66d0b5b:

time: 7.424; rss: 554MB translation
time: 0.003; rss: 554MB assert dep graph
time: 0.000; rss: 554MB serialize dep graph
  time: 1.513; rss: 491MB       llvm function passes [0]
  time: 34.153; rss: 491MB      llvm module passes [0]
  time: 3127.262; rss: 495MB    codegen passes [0]
  time: 0.007; rss: 495MB       codegen passes [0]
time: 3163.323; rss: 495MB      LLVM passes
time: 0.276; rss: 495MB linking

EDIT: This was with an older Cargo.lock, and Cargo didn't react to me updating the repo itself.
It finishes in about a minute otherwise.

nikomatsakis · 2016-07-28T20:50:23Z

Discussed in compiler team meeting. @arielb1 will try to produce a stand-alone test case. Seems like doing some "NRVO-style" optimization may be the best answer here.

nikomatsakis · 2016-07-28T20:57:12Z

One question is whether the workaround that @arielb1 suggested has been applied for glium? @arielb1 at least is of the opinion that (if so) this need not block moving to MIR more broadly, as there are many possible O(n^2) cases we could hit, even in old trans (even if this code was not among them). Though I'd still like to see it fixed =)

jonas-schievink · 2016-07-28T21:00:24Z

Glium is still on gl_generator 0.5.1, the fix is in 0.5.2

EDIT: Ah nevermind, Cargo will update to 0.5.2 anyways :)

arielb1 · 2016-07-28T21:07:52Z

self-contained, but not minified, example at www.cs.tau.ac.il/~arielb1/gl-bindings-self-contained-1.rs.xz

Because of how the O(n^2)-ess works, this only takes a few seconds to codegen, but still an anomalously high time,

eddyb · 2016-07-28T21:08:42Z

@jonas-schievink Oops! Totally my bad, I had a stale Cargo.lock.

arielb1 · 2016-07-28T21:30:18Z

minified example: http://sprunge.us/fGMM (2kLoC - macro magic should be reduce this)

arielb1 · 2016-07-28T21:36:24Z

Template:

pub struct Gl {
    pub data: [FnPtr; LEN]
}


pub struct FnPtr {
    pub f: *const u8,
    pub is_loaded: bool,
}

fn missing_fn_panic() -> ! {
    panic!("gl function was not loaded")
}

impl FnPtr {
    fn new(ptr: *const u8) -> FnPtr {
        if ptr.is_null() {
            FnPtr {
                f: missing_fn_panic as *const u8,
                is_loaded: false
            }
        } else {
            FnPtr { f: ptr, is_loaded: true }
        }
    }
}


pub fn load_with(loadfn: &mut FnMut(&str) -> *const u8) -> Gl {
    let mut metaloadfn = |symbol: &str, symbols: &[&str]| {
        let mut ptr = loadfn(symbol);
        if ptr.is_null() {
            for &sym in symbols {
                ptr = loadfn(sym);
                if !ptr.is_null() { break; }
            }
        }
        ptr
    };

    Gl { data: [

i=0;
while [[ $i -lt $j ]]; do
    echo '        FnPtr::new(metaloadfn("'"function"'", &[])),';
    i=$[i+1]
done >> test-slow.rs
sed -i s/LEN/$j/ test-slow.rs
echo ']}}' >> test-slow.rs

nikomatsakis · 2016-08-01T17:43:22Z

Removing from the "launch MIR into orbit" milestone for the following reasons:

The problem with glium specifically is not present in the latest release.
As @arielb1 argues, there are many sources of "O(n^2)" behavior in LLVM (not to argue we shouldn't try to fix this one), some of which are present in old trans, some in MIR trans.
We do plan to address this source by working on optimizations like move-forwarding etc
Gaining experience with MIR in nightly seems very valuable

tafia · 2016-11-28T07:12:22Z

Is there any news on this?

I have a project (closed source) that can't compile on nightly/release but compiles fine on stable and nightly/debug.
I've stop waiting after more than 20 mins (stable compiles in about 1 min).

When running cargo rustc --release -- -Ztime-passes, the last step I see is:

time: 1.014; rss: 121MB       llvm function passes [0]

So I assume the next step (which is llvm module passes [0] on stable) is having a problem on my system.
I'm using

$ rustc -V
rustc 1.15.0-nightly (2217bd771 2016-11-25)
$ uname -r
2.6.32-431.29.2.el6.x86_64

Is there any command I can run to explore in more details what is happening?
Also I don't know how to try another allocator as specified in this comment

ishitatsuyuki · 2018-05-03T08:05:06Z

Closing this due to inactivity. This was probably a LLVM bug, which may have been already fixed today.

If the issue happens on the same code, please request to reopen. If a similar issue happens with different code, please open a new issue with that.

eddyb added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. I-compiletime Issue: Problems and improvements with respect to compile times. A-MIR Area: Mid-level IR (MIR) - https://blog.rust-lang.org/2016/04/19/MIR.html labels Jun 8, 2016

arielb1 mentioned this issue Jun 15, 2016

don't generate drop ladder steps for fields that don't need dropping #34290

Merged

bors added a commit that referenced this issue Jun 16, 2016

Auto merge of #34290 - arielb1:short-ladder, r=eddyb

a479a6a

don't generate drop ladder steps for fields that don't need dropping cc @eddyb This should help with #34166

jonas-schievink mentioned this issue Jun 16, 2016

Prevent inlining loadfn too many times rust-windowing/gl-rs#406

Merged

eddyb added this to the Launch MIR into Orbit milestone Jul 12, 2016

eddyb changed the title ~~Orbit release build of glium gets stuck in codegen passes~~ Orbit release build of glium takes a very long time in codegen passes Jul 28, 2016

nikomatsakis removed this from the Launch MIR into Orbit milestone Aug 1, 2016

Mark-Simulacrum added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Jul 25, 2017

ishitatsuyuki closed this as completed May 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Orbit release build of glium takes a very long time in codegen passes #34166

Orbit release build of glium takes a very long time in codegen passes #34166

jonas-schievink commented Jun 8, 2016

jonas-schievink commented Jun 8, 2016

arielb1 commented Jun 8, 2016

jonas-schievink commented Jun 8, 2016

jonas-schievink commented Jun 8, 2016

jonas-schievink commented Jun 11, 2016 •

edited

Loading

eddyb commented Jun 11, 2016

alexcrichton commented Jun 11, 2016

eddyb commented Jun 11, 2016

arielb1 commented Jun 16, 2016 •

edited

Loading

arielb1 commented Jun 16, 2016

MagaTailor commented Jul 4, 2016

eddyb commented Jul 12, 2016

nikomatsakis commented Jul 26, 2016

nikomatsakis commented Jul 27, 2016

jonas-schievink commented Jul 27, 2016

eddyb commented Jul 28, 2016 •

edited

Loading

nikomatsakis commented Jul 28, 2016

nikomatsakis commented Jul 28, 2016

jonas-schievink commented Jul 28, 2016 •

edited

Loading

arielb1 commented Jul 28, 2016 •

edited

Loading

eddyb commented Jul 28, 2016

arielb1 commented Jul 28, 2016 •

edited

Loading

arielb1 commented Jul 28, 2016

nikomatsakis commented Aug 1, 2016 •

edited

Loading

tafia commented Nov 28, 2016 •

edited

Loading

ishitatsuyuki commented May 3, 2018

Orbit release build of glium takes a very long time in codegen passes #34166

Orbit release build of glium takes a very long time in codegen passes #34166

Comments

jonas-schievink commented Jun 8, 2016

jonas-schievink commented Jun 8, 2016

arielb1 commented Jun 8, 2016

jonas-schievink commented Jun 8, 2016

jonas-schievink commented Jun 8, 2016

jonas-schievink commented Jun 11, 2016 • edited Loading

eddyb commented Jun 11, 2016

alexcrichton commented Jun 11, 2016

eddyb commented Jun 11, 2016

arielb1 commented Jun 16, 2016 • edited Loading

arielb1 commented Jun 16, 2016

MagaTailor commented Jul 4, 2016

eddyb commented Jul 12, 2016

nikomatsakis commented Jul 26, 2016

nikomatsakis commented Jul 27, 2016

jonas-schievink commented Jul 27, 2016

eddyb commented Jul 28, 2016 • edited Loading

nikomatsakis commented Jul 28, 2016

nikomatsakis commented Jul 28, 2016

jonas-schievink commented Jul 28, 2016 • edited Loading

arielb1 commented Jul 28, 2016 • edited Loading

eddyb commented Jul 28, 2016

arielb1 commented Jul 28, 2016 • edited Loading

arielb1 commented Jul 28, 2016

nikomatsakis commented Aug 1, 2016 • edited Loading

tafia commented Nov 28, 2016 • edited Loading

ishitatsuyuki commented May 3, 2018

jonas-schievink commented Jun 11, 2016 •

edited

Loading

arielb1 commented Jun 16, 2016 •

edited

Loading

eddyb commented Jul 28, 2016 •

edited

Loading

jonas-schievink commented Jul 28, 2016 •

edited

Loading

arielb1 commented Jul 28, 2016 •

edited

Loading

arielb1 commented Jul 28, 2016 •

edited

Loading

nikomatsakis commented Aug 1, 2016 •

edited

Loading

tafia commented Nov 28, 2016 •

edited

Loading