Skip to content

Piracy in the StdLibs #30945

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
13 of 27 tasks
oxinabox opened this issue Feb 3, 2019 · 29 comments
Open
13 of 27 tasks

Piracy in the StdLibs #30945

oxinabox opened this issue Feb 3, 2019 · 29 comments

Comments

@oxinabox
Copy link
Contributor

oxinabox commented Feb 3, 2019

I have been playing with the idea of automatically detecting type piracy.
https://discourse.julialang.org/t/pirate-hunter/20402

Here is the results of running it will the standard libraries loaded.

  • [1] BigFloat(::Irrational{:SQRT_HALF}) in Random at irrationals.jl:158
  • [2] (T::Union{Type{Int8}, Type{UInt8}})(c::Char) in Base at char.jl:170
  • [3] Float64(::Irrational{:SQRT_HALF}) in Random at irrationals.jl:166
  • [4] Float32(::Irrational{:SQRT_HALF}) in Random at irrationals.jl:167
  • [5] (T::Union{Type{Int8}, Type{UInt8}})(c::Char) in Base at char.jl:170
  • [6] apropos(io::IO, needle::Regex) in REPL at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/REPL/src/docview.jl:621
  • [7] apropos(io::IO, string) in REPL at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/REPL/src/docview.jl:619
  • [8] apropos(string) in REPL at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/REPL/src/docview.jl:618
  • [9] doc(binding::Base.Docs.Binding) in REPL at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/REPL/src/docview.jl:85
  • [10] doc(obj::UnionAll) in REPL at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/REPL/src/docview.jl:129
  • [11] doc(object) in REPL at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/REPL/src/docview.jl:130
  • [12] formatdoc(d::Base.Docs.DocStr) in REPL at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/REPL/src/docview.jl:58
  • [13] formatdoc(buffer, d, part) in REPL at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/REPL/src/docview.jl:64
  • [14] parsedoc(d::Base.Docs.DocStr) in REPL at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/REPL/src/docview.jl:67
  • [15] adjoint(B::Union{BitArray{1}, BitArray{2}}) in LinearAlgebra at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/bitarray.jl:223
  • [16] fetch(x) in Distributed at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Distributed/src/remotecall.jl:533
  • [17] in(v::VersionNumber, r::VersionNumber) in Pkg.Types at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Pkg/src/versions.jl:141
  • [18] isless(a::Base.UUID, b::Base.UUID) in Pkg.Types at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Pkg/src/Types.jl:38
  • [19] kron(a::BitArray{1}, b::BitArray{1}) in LinearAlgebra at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/bitarray.jl:96
  • [20] kron(a::BitArray{2}, b::BitArray{2}) in LinearAlgebra at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/bitarray.jl:108
  • [21] kron(a::Number, b::Union{Number, Union{AbstractArray{T,1}, AbstractArray{T,2}} where T}) in LinearAlgebra at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlid/v1.1/LinearAlgebra/src/dense.jl:359
  • [22] rand(X) in Random at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Random/src/Random.jl:224
  • [23] show(io::IO, ::MIME{Symbol("text/csv")}, a) in DelimitedFiles at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/DelimitedFiles/src/DelimitedFiles.jl:828
  • [24] show(io::IO, ::MIME{Symbol("text/tab-separated-values")}, a) in DelimitedFiles at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/DelimitedFiles/src/DelimitedFiles.jl:829
  • [25] string(x::Array{String,1}) in Pkg.Types at /Users/osx/buildbot/slave/package_osx64/build/usr/share (Pkg does a nonsensical type piracy. Pkg.jl#1034)
    /julia/stdlib/v1.1/Pkg/src/Types.jl:1453
  • [26] transpose(B::Union{BitArray{1}, BitArray{2}}) in LinearAlgebra at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/LinearAlgebra/src/bitarray.jl:224
  • [27] wait(fd::RawFD) in FileWatching at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/FileWatching/src/FileWatching.jl:466

I'm not sure how many of these already have issues, eg #28234
and how many are harmless.
I am sure someone more knowledge-able than me can strike a bunch of them as not problematic.
But I figured I should share the list

@JeffBezanson
Copy link
Member

Wow, a couple interesting items here.

(T::Union{Type{Int8}, Type{UInt8}})(c::Char) in Base at char.jl:170

That's entirely in base, seems like an incorrect result.

string(x::Vector{String}) = x

Pkg, are you kidding me? It doesn't even return a string...

  • fetch(x) in Distributed

Yep, should be moved to Base. Not sure fetch should be defined on Any, but that's a separate question.

It looks like every useful method of kron is in LinearAlgebra, so that's not so bad.

wait(::RawFD) is ok since it strictly adds functionality.

@oxinabox
Copy link
Contributor Author

oxinabox commented Feb 3, 2019

You have the power to tick them (and other meanless ones and false positives) off right?

@timholy
Copy link
Member

timholy commented Feb 3, 2019

That is really cool. I worry about about people getting too worried about piracy (there are occasionally some very good reasons for it), but having a tool to detect seems like a clear win.

@KristofferC
Copy link
Member

Pkg, are you kidding me? It doesn't even return a string...

JuliaLang/Pkg.jl#1034

@JeffBezanson
Copy link
Member

Right; I look forward to wearing my "I'm a type pirate" T-shirt :) I think this is best seen as a heuristic for finding misplaced code, unintentional method extensions, etc.

@JeffBezanson
Copy link
Member

  • BigFloat(::Irrational{:SQRT_HALF}) in Random at irrationals.jl:158

This is kind of OK since it's the only way to make new irrational numbers. However, we should perhaps add a couple more like sqrt(2) to Base.

  • apropos(io::IO, needle::Regex) in REPL at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/REPL/src/docview.jl:621

This is intentional, since we wanted to move doc viewing code out of Base. But I think it's still up in the air where all the Base.Docs code should go exactly.

  • in(v::VersionNumber, r::VersionNumber)

Could maybe go in base, but VersionNumber isn't iterable. Maybe Pkg could arrange to wrap the argument in a VersionRange instead?

  • isless(a::Base.UUID, b::Base.UUID)

Can move to Base.

  • show(io::IO, ::MIME{Symbol("text/csv")}, a)

This might actually make sense in Base as well. The code for writing is much simpler than the code for parsing. We should maybe even use this as the output format of print on matrices. It should probably also be specialized at least as a::AbstractArray.

@ararslan
Copy link
Member

ararslan commented Feb 3, 2019

in(v::VersionNumber, r::VersionNumber)

JuliaLang/Pkg.jl#1035

@00vareladavid
Copy link
Contributor

00vareladavid commented Feb 3, 2019

[25] string(x::Array{String,1}) in Pkg.Types

Fix at JuliaLang/Pkg.jl#1036

@ararslan
Copy link
Member

ararslan commented Feb 3, 2019

• isless(a::Base.UUID, b::Base.UUID)

Can move to Base.

#30947

JeffBezanson added a commit that referenced this issue Feb 4, 2019
JeffBezanson added a commit that referenced this issue Feb 4, 2019
JeffBezanson added a commit that referenced this issue Feb 5, 2019
JeffBezanson added a commit that referenced this issue Feb 5, 2019
@dlfivefifty
Copy link
Contributor

There's also vcat(::Vector...) in SparseArrays.jl:

vcat(A::Vector...) = Base.typed_vcat(promote_eltype(A...), A...)

SparseArrays seems like a chronic offender: #32213

@KristofferC
Copy link
Member

KristofferC commented Nov 28, 2019

This is missing e.g

julia> @which rand(5,5) * rand(5,5)
*(A::AbstractArray{T,2} where T, B::AbstractArray{T,2} where T) in LinearAlgebra at ...julia/stdlib/v1.3/LinearAlgebra/src/matmul.jl:152

@KristofferC
Copy link
Member

I worry about about people getting too worried about piracy (there are occasionally some very good reasons for it), but having a tool to detect seems like a clear win.

FWIW, type piracy means that it is impossible to look at the Project and know what stdlibs a package actually use. For example, you can do matrix multiplication or call rand() without requiring LinearAlgebra or Random in your Project file. In the scenario where one wants to ship an app with a custom sysimage it would be desirable to exclude the stdlibs not needed by the app (for smaller file size and faster load times). Type-piracy now comes and puts a big stick in the wheel because there is no way to know what stdlibs the app actually need because the code might be filled with matrix multiplication but no trace of LinearAlgebra is found.

@ViralBShah
Copy link
Member

#43127 fixes some of the sparse arrays related ones.

Keno pushed a commit that referenced this issue Jun 5, 2024
@StefanKarpinski
Copy link
Member

Thoughts on a possible way forward here:

  • add a feature where a method definition can trigger the loading of a package where the real definition lives
  • e.g. if you do A*B for matrices without loading LinearAlgebra it triggers it to be loaded
  • this fails if LinearAlgebra is not in any environment in your load path
  • default load path contains @stdlib which includes LinearAlgebra
  • within a project, LinearAlgebra needs to become an explicit dependency to use matmul, etc.

This is technically breaking since at the moment projects can do matmul without declaring a dependency on LinearAlgebra, so maybe we can make it a warning for a while. This is kind of just a lie because the code does already depend on LinearAlgebra, it just gets to pretend that it doesn't.

@dlfivefifty
Copy link
Contributor

It would be nice if this approach was made available for non-Stdlib packages as well.

@KristofferC
Copy link
Member

I think this was discussed in #51432.

@vtjnash
Copy link
Member

vtjnash commented Feb 27, 2025

I think the main problem with that approach is that it continues to be brittle. Since it loads the package as long as any dependency declares it is in the manifest, changing dependency versions could still end up revealing it is broken (much the same as the current situation, where "that dependency" just happens to be the julia sysimg)

@ChrisRackauckas
Copy link
Member

Alternatively, Base Julia could ship with a simple libblastrampoline implementation that just does things like a 3 line matmul, so the definition is there and works. Using LinearAlgebra just triggers a BLAS load with better definitions. This would then making building without blas much easier.

Then over time, using BLAS could just be using OpenBLAS, and BLAS could split from LinearAlgebra. In that sense, OpenBLAS is just an option, on no different footing than MKL.

If people are so worried about performance we could just throw a warning the first time the generic fallback is hit with BLAS types recommending BLAS be loaded. But carrying BLAS around everywhere because of the possibility that someone might not know of the performance optimization seems like a heavy cost to pay. Usually signal and opt into performance.

Note that removing BLAS from LinearAlgebra is probably considered breaking so that can't be without a 2.0. But adding methods so things work without BLAS wouldn't be broken?

@StefanKarpinski
Copy link
Member

@ChrisRackauckas: I really don't think that shipping a subpar BLAS and constantly telling people not to use it is a viable path forward.

@vtjnash: I don't understand you're comment about brittleness. You seem to be talking about some entirely different issue that afaict you haven't actually explained.

The problem here is that we currently have to build LinearAlgebra into the sysimg and we'd like to not do that. Lazy JLL loading would help avoid always loading BLAS. Trimming can help prove that an application does not actually do any BLAS operations and produce a binary without it. Maybe that's enough. But it would be nice to be able to just look in the manifest for a project and know that it doesn't definitely doesn't need linear algebra and BLAS without doing any fancy static analysis. That's what my proposal does: if LinearAlgebra isn't in the project file, then a project cannot use LinearAlgebra and BLAS without failing, just like it would if someone tried to import LinearAlgebra; if LinearAlgebra isn't in the manifest for an application then it likewise definitely doesn't use LinearAlgebra and BLAS.

There are two potential issues I can see with what I've proposed. First, that it's somewhat breaking. But I'd argue that in any code that this breaks, there was already a real but undeclared dependency on LinearAlgebra via piracy and that forcing that existing dependency to be explicitly declared is reasonable and something of a bug fix.

The second issue is that it's weird for a method call to implicitly do an import. It's already possible with eval and that's basically what this would do, but it might be better to register it in some more visible way.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Mar 1, 2025

Specifically this:

Since it loads the package as long as any dependency declares it is in the manifest, changing dependency versions could still end up revealing it is broken (much the same as the current situation

This is always true? Different versions of things can have different dependencies so I have no idea what makes this different.

@KristofferC
Copy link
Member

That's what my proposal does: if LinearAlgebra isn't in the project file, then a project cannot use LinearAlgebra and BLAS without failing

Should go full out then and require it to be a direct dependency of the package that uses it.

But the whole automatically loading packages at run time has never worked that well. You easily run into world age errors etc. Also, the concern in #51432 (comment) needs to be addressed AFAIU.

@o314
Copy link
Contributor

o314 commented Mar 1, 2025

Different versions of things can have different dependencies so I have no idea what you're talking about.

That may lead to a big issue in the long run. Particulary when one uses a sat solver that tends to blackout tracking with some logical processing
IMHO some pks nodes and pkg deps are not all equal and should have differents capabilities

That may be useful to design in term of cluster (a +/- open subgraph)

Image

With special nodes meaning

  • big node Julia, BLAS, Flux etc. a cluster entrypoint
  • inner node
  • trans node

With special links meaning

  • a kappa link must be stated explicitely by the user/system config
  • a phi link must be stated explicitely by the user/system config
  • a kappa link must not be changed by an alpha, a beta link
  • a phi link must not be changed by a beta link # rvw phi vs alpha ?

That may need a not so easy upgrade of the pkg solver.
But one should be able to start manually with pkg compat rules
It may be very hard and error prone to ignore this too even manually IMO 2

@StefanKarpinski
Copy link
Member

@dlfivefifty: It would be nice if this approach was made available for non-Stdlib packages as well.

I'm really not sure we want to encourage this. If I had a time machine, I would maybe just change this aspect of the language. Certainly requiring people to do using Random before using rand is perfectly reasonable. The only case that really gives pause is matmul since clearly * needs to be a basic function and I don't think it's reasonable to require LinearAlgebra to get arrays. Even if we were to have separate list (growable, mutable) and vector (fixed-size, maybe immutable even), I'm not sure it makes sense for LinearAlgebra to own vectors—it's such a useful basic type. Suppose we defined it in its own Arrays stdlib. That still doesn't solve the problem unless Arrays depends on LinearAlgebra which depends on BLAS, which all feels too heavy for just getting a vector type. So we'd have the same exact problem, just inside of the Arrays stdlib. The matmul situation is basically the one where I cannot see a reasonable way around it.

@KristofferC: Should go full out then and require it to be a direct dependency of the package that uses it.

Agree. That's what I'm suggesting.

But the whole automatically loading packages at run time has never worked that well. You easily run into world age errors etc. Also, the concern in #51432 (comment) needs to be addressed AFAIU.

I think @Keno's suggested approach there sounds great and matches very closely what I was thinking. [Aside: It's kind of annoying that this conversation is spread across so many issues and PRs.] To be clear, I suspect you cannot do this correctly with eval. That's why I'm suggesting a language feature. You essentially want dummy entries in the method table that act like normal for method sorting and so on, until something tries to actually use them, and then once that happens it causes the external package to get loaded.

@dlfivefifty
Copy link
Contributor

I'm really not sure we want to encourage this.

The issue is sometimes it's natural to have packages do type piracy in filling in features. I.e. sometimes there's a "*Core" or "*Base" package to implement an interface for overloading but the default implementation is in the main package.

It's possible that every instance of this can be replaced by extensions, since this pattern began before the creation of extensions. But extensions are rather limited for implementing new features, and currently do not support dependencies between different extensions.

@KristofferC
Copy link
Member

I.e. sometimes there's a "*Core" or "*Base" package to implement an interface for overloading but the default implementation is in the main package.

But that's not done via type piracy?

@vtjnash
Copy link
Member

vtjnash commented Mar 7, 2025

require it to be a direct dependency of the package that uses it.

We don't have any runtime way to specify this though: as long as LinAlg is any indirect dependency, the methods get added. To force direct dependency, we would have to export a different * from LinAlg (which has fallbacks to call Base) and becomes the default for packages that explicitly use LinAlg. That is fairly doable technically, but might be awkward in some situations also. Though doing that (some sort of overriding expert) might solve the issue with mul, and other similar methods (eg sparse vcat) where the overloads are currently are just pirating the Base method.

@dlfivefifty
Copy link
Contributor

But that's not done via type piracy?

Yes it is. Eg AbstractFFTs defines fft which is overloaded for fft(::Array) downstream in FFTW.jl

@KristofferC
Copy link
Member

Yes it is.

Okay, let me rephrase, "it is shouldn't be done via type piracy".

@dlfivefifty
Copy link
Contributor

What's the alternative then?

It feels like the type piracy pattern used for AbstractFFts/FFTW.jl for exactly the same reason its used in StdLib/LinearAlgebra.jl, because it's a natural pattern. And so if the problem is the same, so should the solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests