Skip to content

docker ruby:alpine SEGFAULT error #91

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ababich opened this issue Feb 15, 2017 · 33 comments
Closed

docker ruby:alpine SEGFAULT error #91

ababich opened this issue Feb 15, 2017 · 33 comments

Comments

@ababich
Copy link

ababich commented Feb 15, 2017

I cannot write more "proper" subject because I do not know exact issue:

Precondition: we have ruby 2.4 project using rails 5.0.1
This project is running inside docker 1.13+

When running in dev mode w/o docker skylight works in development mode
Works means rspec test coverage here
When running in docker we have SEGFAULT, ruby 2.3 and 2.4 verified - same issues

Example logs:

/usr/local/lib/ruby/2.4.0/timeout.rb:83: [BUG] Segmentation fault at 0x007fac00000001
ruby 2.4.0p0 (2016-12-24 revision 57164) [x86_64-linux]

-- Control frame information -----------------------------------------------
c:0049 p:---- s:0269 e:000268 CFUNC  :start
c:0048 p:0031 s:0265 E:000e68 BLOCK  /usr/local/lib/ruby/2.4.0/timeout.rb:83 [FINISH]
c:0047 p:0108 s:0259 E:001a60 METHOD /usr/local/lib/ruby/2.4.0/timeout.rb:103
c:0046 p:0090 s:0247 E:0019a8 METHOD /usr/local/lib/ruby/2.4.0/net/http.rb:902
...
-- Machine register context ------------------------------------------------
 RIP: 0x00007fac2720bd1e RBP: 0x000055792c826298 RSP: 0x00007ffe9050b828
 RAX: 0x00007fac2198beef RBX: 0x00007fac2198cab0 RCX: 0x0000000000000000
 RDX: 0x000055792b536778 RDI: 0x00007fac2198beef RSI: 0x00007fac00000001
  R8: 0x00007fac2188c000  R9: 0x0000000000000000 R10: 0x0000000000000022
 R11: 0x0000000000000206 R12: 0x00007fac2198ca88 R13: 0x00007fac2198caa0
 R14: 0x00007fac2198cc00 R15: 0x00007fac27446b28 EFL: 0x0000000000010202

-- Other runtime information -----------------------------------------------

* Loaded script: /usr/local/bundle/bin/rspec

* Loaded features:

    0 enumerator.so
    1 thread.rb
    2 rational.so
    3 complex.so
    4 /usr/local/lib/ruby/2.4.0/x86_64-linux/enc/encdb.so
...
  749 /usr/local/bundle/gems/knock-2.0/lib/knock.rb
  750 /usr/local/bundle/gems/skylight-1.0.1/lib/skylight/version.rb
  751 /usr/local/bundle/gems/skylight-1.0.1/lib/skylight/util/platform.rb
  752 /usr/local/bundle/gems/skylight-1.0.1/lib/skylight_native.so
  753 /usr/local/bundle/gems/skylight-1.0.1/lib/skylight/native.rb
  754 /usr/local/bundle/gems/skylight-1.0.1/lib/skylight/railtie.rb
  755 /usr/local/bundle/gems/skylight-1.0.1/lib/skylight/compat.rb
  756 /usr/local/bundle/gems/skylight-1.0.1/lib/skylight/util/deploy.rb
...
 1480 /usr/local/bundle/gems/http-cookie-1.0.3/lib/http/cookie_jar/abstract_store.rb
 1481 /usr/local/bundle/gems/http-cookie-1.0.3/lib/http/cookie_jar/hash_store.rb

* Process memory map:

5579299e6000-5579299e7000 r-xp 00000000 fd:09 3700426                    /usr/local/bin/ruby
557929be6000-557929be7000 r--p 00000000 fd:09 3700426                    /usr/local/bin/ruby
557929be7000-557929be8000 rw-p 00001000 fd:09 3700426                    /usr/local/bin/ruby
55792aa89000-55792e735000 rw-p 00000000 00:00 0                          [heap]
7fac2188b000-7fac2188c000 ---p 00000000 00:00 0 
7fac2188c000-7fac21a8e000 rw-p 00000000 00:00 0 
...
7ffe90558000-7ffe9055a000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]


[NOTE]
You may have encountered a bug in the Ruby interpreter or extension libraries.
Bug reports are welcome.

the test above tries to reach external endpoint via restclient 2.0
this is integration test

Exactly the same test is perfectly working w/o skylight

Docker image is based on ruby:2.4-alpine

Please let me know if this is known issues or you need more details to reproduce

@wagenet
Copy link
Contributor

wagenet commented Feb 15, 2017

The big problem with running in Docker is that a lot of normally expected dependencies aren't guaranteed to exist. In this case it looks like the actual segfault is from timeout.rb, not our own code. It seems possible that you're missing some internal dependency for Ruby that's causing this.

@wagenet
Copy link
Contributor

wagenet commented Feb 15, 2017

To be clear, I don't think this is Skylight crashing. I think Skylight is attempting an HTTP request with Net::HTTP which has a timeout property on it. Then Ruby's internal Timeout is crashing.

If you want to test more, can you try making your own Net::HTTP request with a timeout and see if that crashes as well? That would ensure that Skylight isn't the issue.

@ababich
Copy link
Author

ababich commented Feb 15, 2017

Please, pay attention that Exactly the same test is perfectly working w/o skylight
Means test suite is real and stable and it is crashing with SEGFAULT only while adding skylight into stack

@ababich
Copy link
Author

ababich commented Feb 15, 2017

OK, I can confirm that this is Alpine image-only issue
If I switch to ruby:2.4 image (debian based) everything is immediately working fine!

@jirutka
Copy link

jirutka commented Feb 16, 2017

That image ruby:2.4-alpine does not use Alpine’s ruby package, but compiles it itself. So the first thing I’d suggest is to try skylight on clean Alpine v3.5 and Ruby installed from the official package main/ruby (2.3.3). If you need Ruby 2.4.0, then you can install it from edge (unstable).

If it will work here, then it’s not a bug in skylight nor Alpine, but that docker image (I’d not be surprised…). Otherwise I’d look at it more closely, because it might be a bug in our Ruby package, or skylight is doing something very nasty in its native extension. Please note that Alpine uses musl libc, not GNU libc. However, I’d not expect problem in libc compatibility here, if it compiles without any error (I assume that you compile this extension directly on Alpine…).

@ababich
Copy link
Author

ababich commented Feb 16, 2017

@jirutka ruby 2.4 was released as stable 2 months ago, but it is not the issue, because on debian-based image skylight works

ruby:2.4-alpine IS based on alpine:3.4, please see https://github.com/docker-library/ruby/blob/e294c37cc1c2e52e5e765b7dff7f6549eedd7a47/2.4/alpine/Dockerfile
or please clarify what you wanted to say

Thanks!

@wagenet
Copy link
Contributor

wagenet commented Feb 16, 2017

@ababich so far, the problem you're seeing looks likely to be caused by a problematic Docker container. I believe this is what @jirutka is trying to say. This is also backed up by the fact that Skylight runs with this Dockerfile: https://gist.github.com/wagenet/5ae561a4105c7b32483b2a08aebe8f34

I understand that you're saying the issue only happens when you have the Skylight gem installed. However, I'm suggesting that the SEGFAULT you're seeing may still be from Ruby internals, not Skylight. If Skylight is using a Ruby feature that isn't used elsewhere in your app then removing the Skylight gem would solve this problem.

From what I can see of the logs you shared, the error is from Ruby's Timeout code as utilized by Net::HTTP. If this is indeed the case, then there still isn't much we can do in Skylight to directly to solve the issue.

Docker setups attempt to strip out everything that is non-essential, but as a result, it's easy to accidentally strip out something that is actually important. If something essential to Ruby's internals is missing from the Docker container, that could definitely cause the problem.

There are a couple of ways to proceed from here:

  1. You could demonstrate that my hypothesis about it being from Ruby's internal Timeout is incorrect. The way to do this would be to make a test app with the same Docker image that uses Net::HTTP's timeouts. I suspect this is the spot in our code to look at: https://github.com/skylightio/skylight-ruby/blob/master/lib/skylight/util/http.rb#L130

  2. You could show that things also fail when you manually install Ruby on a vanilla Alpine image. This is what @jirutka is suggesting.

@ababich
Copy link
Author

ababich commented Feb 16, 2017

To be honest, I do not see much sense in further experiments for next reasons:

  1. I use official ruby images ruby:2.3-alpine vs ruby:2.3 and same for 2.4 in both cases results are similar - Alpine not working, non-Alpine does work

This means that dockerizing cannot strip anything - both images are fully official with no changes, and I've never seen issues switching between them

  1. You suggest approach in no.1 (switching between official ruby images) to show the issue not competent enough - this sounds for me that you do not see issue (where it actually is, see no.4 for clarification, please)

  2. for me debian ruby image is OK right now and I really can use it and switch to it (as I mentioned both images were always equal for me and dozens of other gems)

  3. From one hand, it's great that Docker gist you posted works - from the other hand skylight clients most likely are going to use official ruby:alpine image than playing with custom setups - that's why I suggest that, probably, there is an option to fix official ruby:alpine image - but right now this should be more important for skylight devs than for me as a user of this images, because I can switch to debian and get rid of this particular problem not changing anything else

@ababich ababich changed the title SEGFAULT error ruby:alpine SEGFAULT error Feb 16, 2017
@ababich ababich changed the title ruby:alpine SEGFAULT error docker ruby:alpine SEGFAULT error Feb 16, 2017
@wagenet
Copy link
Contributor

wagenet commented Feb 16, 2017

@ababich if you're happy with Debian then it sounds like a good solution. In my experience Docker containers commonly run into issues like this, though YMMV.

@wagenet
Copy link
Contributor

wagenet commented Feb 16, 2017

Right now getting things to work on Alpine is a low priority. However, if someone is able to provide a simple reproduction (i.e. a repo we can clone that causes the issue) I'd be happy to reopen this and investigate further.

@wagenet
Copy link
Contributor

wagenet commented Feb 16, 2017

Turns out it was an issue with when Ruby is compiled. This should fix it 138bb9a

@GRoguelon
Copy link

Hi @wagenet,

We use Ruby on Rails with Docker, we would like to use Skylight but I still get the segmentation fault exception at the starting of the container.

According the issues about Alpine, a patch should be provided since at least March but even we the latest versions (Skylight: v1.3.1), I still get the error.

The problem blocks us to test Skylight and I don't have a lot of time for that.

How can I help you to identify the cause?

In the IRB of the container (ruby:2.4-alpine):

RbConfig::CONFIG['arch'] #=> "x86_64-linux-musl"

In the shell of the container:

$ ldd --version
musl libc (x86_64)
Version 1.1.14
Dynamic Program Loader
Usage: ldd [options] [--] pathname

In the shell of the container:

$ cat /etc/os-release
NAME="Alpine Linux"
ID=alpine
VERSION_ID=3.4.6
PRETTY_NAME="Alpine Linux v3.4"
HOME_URL="http://alpinelinux.org"
BUG_REPORT_URL="http://bugs.alpinelinux.org"

Let me know if you need something else.

Thank you.

@wagenet wagenet reopened this Jul 19, 2017
@wagenet
Copy link
Contributor

wagenet commented Jul 19, 2017

@GRoguelon sorry for the delayed reply, and sorry for the trouble! I've found that getting stuff to work with musl is a pretty annoying process. However, I'll see if I can investigate this sometime in the near future.

@jirutka
Copy link

jirutka commented Jul 19, 2017

I've found that getting stuff to work with musl is a pretty annoying process.

Then you’re probably doing something nasty. Musl is quite good litmus paper…

@wagenet
Copy link
Contributor

wagenet commented Jul 19, 2017

@jirutka that's possible, but I've found that error messages are quite opaque and not even all "standard" libraries appear to play well with musl. So while you may be technically correct, it doesn't make things any less painful.

@jirutka
Copy link

jirutka commented Jul 19, 2017

You can try to ask in Freenode’s channel #musl, there are clever people who can help.

@jirutka
Copy link

jirutka commented Jul 19, 2017

not even all "standard" libraries appear to play well with mus.

What do you consider as “standard” libraries? We have a complete Linux distribution based on musl libc, Alpine Linux. Sure, some libraries and programs needs to be patched, b/c they rely on non-standard GNU libc extensions and bugs, but it’s almost always solvable.

@wagenet
Copy link
Contributor

wagenet commented Jul 19, 2017

@jirutka to be clear, I'm not saying it's unsolvable, but that it can be non-trivial, especially for someone who is not an expert in the intricacies of the system. I'll definitely give the Freenode channel a go when I'm able to allocate more time to this.

@jirutka
Copy link

jirutka commented Jul 19, 2017

Aha, skylight-ruby downloads some some precompiled binaries libskylight.so and skylightd (skylight-native). You provide even musl variant of these binaries, so it’s not classic problem with running binary linked with glibc on musl-based system.

It seems that there’s no source-code available, so I cannot check it and try to compile. Users cannot review it and I’d bet that most of them even don’t know that this Ruby gem downloads some proprietary binary from somewhere. I’m done here.

(As I said, musl is quite good litmus paper… for bad code and even suspicious software.)

@ryansch
Copy link

ryansch commented Jul 19, 2017

@jirutka That's a bit premature. The binary is a rust library they can share among the different language specific agents. It's one of the things that makes skylight work so well. We're using skylight on musl in production with no issues.

@wagenet
Copy link
Contributor

wagenet commented Jul 19, 2017

@jirutka I'm really not understanding your antagonism here. Skylight is a proprietary product and we're pretty clear about downloading pre-compiled code.

@jirutka
Copy link

jirutka commented Jul 19, 2017

There’s no source-code in skylightio/skylight-rust repository, only binaries.

@wagenet How do you build it? On Alpine system with rust installed from Alpine package, or other Linux system with musl toolchain?

@wagenet
Copy link
Contributor

wagenet commented Jul 19, 2017

@jirutka currently we're building it with the musl toolchain on a CentOS system. I'm completely ready to admit that this may not be the ideal way to do things. I'd really like to rewrite our internal build system (it was cobbled together before Rust was 1.0 and things were not nearly so good then) which would probably help with some of our issues.

@jirutka
Copy link

jirutka commented Jul 19, 2017

…we're pretty clear about downloading pre-compiled code.

Where? There’s no single note in the gem’s description on Rubygems and no warning when installing it:

gem install skylight
Fetching: skylight-1.3.1.gem (100%)
Building native extensions.  This could take a while...
Successfully installed skylight-1.3.1
1 gem installed

Also license field is completely missing in your gemspec.

@wagenet
Copy link
Contributor

wagenet commented Jul 19, 2017

@jirutka If you're a current customer and you want to contact our support to discuss these sorts of issues, please do so. This isn't the appropriate thread for discussing these issues.

But to give you a quick answer, we talk about it extensively in our marketing. The gem isn't any use if you don't sign up for our product so I expect people to be generally aware. We've yet to have any customer complain about the existence of a precompiled native agent.

@jirutka
Copy link

jirutka commented Jul 19, 2017

If you're a current customer and you want to contact our support to discuss these sorts of issues…

No, I’m not and not willing to be. I’m one of the Alpine Linux’s developers, that’s why I started watching this issue. But I cannot help much with something that does not have source code available.

@jirutka
Copy link

jirutka commented Jul 19, 2017

libskylight.so is a shared library that depends on libgcc.

ldd libskylight.so
	ldd (0x3160fb1b000)
	libgcc_s.so.1 => /usr/lib/libgcc_s.so.1 (0x3160eafa000)
	libc.so => ldd (0x3160fb1b000)

That’s interesting, because rustc (still) don’t support building dynamically linked binaries with musl. How you did it? We needed to patch rust to allow this and fix many bad assumptions and bugs in Rust’s build system. Unfortunately these patches are not merged into upstream yet.

Anyway, you may try to build skylight-rust on Alpine Linux with Rust installed from our package (community/rust). The default triplet is <arch>-alpine-linux-musl, this use dynamic linking (of system libraries, not Rust libraries) by default. I don’t know if it will help, but it’s worth a try.

You can use script alpine-chroot-install to easily install Alpine Linux into chroot on your CentOS builders. I use it on Travis CI that runs Ubuntu (example .travis.yml).

@wagenet
Copy link
Contributor

wagenet commented Jul 19, 2017

@jirutka here's our Makefile. I'll be the first to tell you that it's not pretty: https://gist.github.com/wagenet/094517642304615fbb9295f5744dd78c.

@jirutka
Copy link

jirutka commented Jul 19, 2017

Do I understand it right that you build libskylight.a with cargo/rustc and then link it into libskylight.so with gcc? Well, rustc produces native binaries, so why not, I’ve just never thought about such approach.

How do you build libcrypto.a, libcurl.a, libssl.a, libz.a?

@wagenet
Copy link
Contributor

wagenet commented Jul 19, 2017

Our Rust code has two portions: a completely standalone daemon and a library that is loaded by Ruby that then interfaces with the daemon. Where things get a bit complicated is in making that library that Ruby can load.

I've updated the gist to include the deps Makefile: https://gist.github.com/wagenet/094517642304615fbb9295f5744dd78c

@wagenet
Copy link
Contributor

wagenet commented Jul 20, 2017

I was able to resolve some of the recent build issues. I can't guarantee that it will work for @ababich, but at least the build toolchain is working correctly again!

@wagenet
Copy link
Contributor

wagenet commented Jul 20, 2017

@ababich if this is still an issue for you, would you be able to provide a sample Dockerfile?

@wagenet
Copy link
Contributor

wagenet commented Sep 28, 2017

Closing due to inability to reproduce. Will happily reopen if anyone can help with a reproduction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants