Setup docs building.

yebai · yebai · commit 0bb86e21b6e9 · 2023-01-06T22:33:46.000Z
diff --git a/.github/workflows/Docs.yml b/.github/workflows/Docs.yml
@@ -0,0 +1,32 @@
+name: Documentation
+
+on:
+  push:
+    branches:
+      # Build the master branch.
+      - master
+    tags: '*'
+  pull_request:
+
+concurrency:
+  # Skip intermediate builds: always.
+  # Cancel intermediate builds: only if it is a pull request build.
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: ${{ startsWith(github.ref, 'refs/pull/') }}
+
+jobs:
+  docs:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v2
+      - uses: julia-actions/setup-julia@latest
+        with:
+          version: '1'
+      - name: Install dependencies
+        run: julia --project=docs/ -e 'using Pkg; Pkg.develop(PackageSpec(path=pwd())); Pkg.instantiate()'
+      - name: Build and deploy
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # For authentication with GitHub Actions token
+          DOCUMENTER_KEY: ${{ secrets.DOCUMENTER_KEY }} # For authentication with SSH deploy key
+          JULIA_DEBUG: Documenter # Print `@debug` statements (https://github.com/JuliaDocs/Documenter.jl/issues/955)
+        run: julia --project=docs/ docs/make.jl
diff --git a/docs/Project.toml b/docs/Project.toml
@@ -0,0 +1,11 @@
+[deps]
+Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
+Functors = "d9f16b24-f501-4c13-a1f2-28368ffc5196"
+StableRNGs = "860ef19b-820b-49d6-a774-d7a799459cd3"
+Zygote = "e88e6eb3-aa80-5325-afca-941959d7151f"
+
+[compat]
+Documenter = "0.27"
+Functors = "0.3"
+StableRNGs = "1"
+Zygote = "0.6"
diff --git a/docs/make.jl b/docs/make.jl
@@ -0,0 +1,16 @@
+using Documenter
+using Bijectors
+
+# Doctest setup
+DocMeta.setdocmeta!(Bijectors, :DocTestSetup, :(using Bijectors); recursive=true)
+
+makedocs(
+    sitename = "Bijectors",
+    format = Documenter.HTML(),
+    modules = [Bijectors],
+    pages = ["Home" => "index.md", "Distributions.jl integration" => "distributions.md", "Examples" => "examples.md"],
+    strict=false,
+    checkdocs=:exports,
+)
+
+deploydocs(repo = "github.com/TuringLang/Bijectors.jl.git", push_preview=true)
diff --git a/docs/src/distributions.md b/docs/src/distributions.md
@@ -0,0 +1,52 @@
+## Basic usage
+Other than the `logpdf_with_trans` methods, the package also provides a more composable interface through the `Bijector` types. Consider for example the one from above with `Beta(2, 2)`.
+
+```julia
+julia> using Random; Random.seed!(42);
+
+julia> using Bijectors; using Bijectors: Logit
+
+julia> dist = Beta(2, 2)
+Beta{Float64}(α=2.0, β=2.0)
+
+julia> x = rand(dist)
+0.36888689965963756
+
+julia> b = bijector(dist) # bijection (0, 1) → ℝ
+Logit{Float64}(0.0, 1.0)
+
+julia> y = b(x)
+-0.5369949942509267
+```
+
+In this case we see that `bijector(d::Distribution)` returns the corresponding constrained-to-unconstrained bijection for `Beta`, which indeed is a `Logit` with `a = 0.0` and `b = 1.0`. The resulting `Logit <: Bijector` has a method `(b::Logit)(x)` defined, allowing us to call it just like any other function. Comparing with the above example, `b(x) ≈ link(dist, x)`. Just to convince ourselves:
+
+```julia
+julia> b(x) ≈ link(dist, x)
+true
+```
+
+## Transforming distributions
+
+```@setup transformed-dist-simple
+using Bijectors
+```
+
+We can create a _transformed_ `Distribution`, i.e. a `Distribution` defined by sampling from a given `Distribution` and then transforming using a given transformation:
+
+```@repl transformed-dist-simple
+dist = Beta(2, 2)      # support on (0, 1)
+tdist = transformed(dist) # support on ℝ
+
+tdist isa UnivariateDistribution
+```
+
+We can the then compute the `logpdf` for the resulting distribution:
+
+```@repl transformed-dist-simple
+# Some example values
+x = rand(dist)
+y = tdist.transform(x)
+
+logpdf(tdist, y)
+```
diff --git a/docs/src/examples.md b/docs/src/examples.md
@@ -0,0 +1,163 @@
+```@setup advi
+using Bijectors
+```
+
+## Univariate ADVI example
+But the real utility of `TransformedDistribution` becomes more apparent when using `transformed(dist, b)` for any bijector `b`. To get the transformed distribution corresponding to the `Beta(2, 2)`, we called `transformed(dist)` before. This is simply an alias for `transformed(dist, bijector(dist))`. Remember `bijector(dist)` returns the constrained-to-constrained bijector for that particular `Distribution`. But we can of course construct a `TransformedDistribution` using different bijectors with the same `dist`. This is particularly useful in something called _Automatic Differentiation Variational Inference (ADVI)_.[2] An important part of ADVI is to approximate a constrained distribution, e.g. `Beta`, as follows:
+1. Sample `x` from a `Normal` with parameters `μ` and `σ`, i.e. `x ~ Normal(μ, σ)`.
+2. Transform `x` to `y` s.t. `y ∈ support(Beta)`, with the transform being a differentiable bijection with a differentiable inverse (a "bijector")
+
+This then defines a probability density with same _support_ as `Beta`! Of course, it's unlikely that it will be the same density, but it's an _approximation_. Creating such a distribution becomes trivial with `Bijector` and `TransformedDistribution`:
+
+```@repl advi
+using StableRNGs: StableRNG
+rng = StableRNG(42);
+dist = Beta(2, 2)
+b = bijector(dist)              # (0, 1) → ℝ
+b⁻¹ = inverse(b)                # ℝ → (0, 1)
+td = transformed(Normal(), b⁻¹) # x ∼ 𝓝(0, 1) then b(x) ∈ (0, 1)
+ x = rand(rng, td)                   # ∈ (0, 1)
+```
+
+It's worth noting that `support(Beta)` is the _closed_ interval `[0, 1]`, while the constrained-to-unconstrained bijection, `Logit` in this case, is only well-defined as a map `(0, 1) → ℝ` for the _open_ interval `(0, 1)`. This is of course not an implementation detail. `ℝ` is itself open, thus no continuous bijection exists from a _closed_ interval to `ℝ`. But since the boundaries of a closed interval has what's known as measure zero, this doesn't end up affecting the resulting density with support on the entire real line. In practice, this means that
+
+```@repl advi
+td = transformed(Beta())
+inverse(td.transform)(rand(rng, td))
+```
+
+will never result in `0` or `1` though any sample arbitrarily close to either `0` or `1` is possible. _Disclaimer: numerical accuracy is limited, so you might still see `0` and `1` if you're lucky._
+
+## Multivariate ADVI example
+We can also do _multivariate_ ADVI using the `Stacked` bijector. `Stacked` gives us a way to combine univariate and/or multivariate bijectors into a singe multivariate bijector. Say you have a vector `x` of length 2 and you want to transform the first entry using `Exp` and the second entry using `Log`. `Stacked` gives you an easy and efficient way of representing such a bijector.
+
+```@repl advi
+using Bijectors: SimplexBijector
+
+# Original distributions
+dists = (
+    Beta(),
+    InverseGamma(),
+    Dirichlet(2, 3)
+);
+
+# Construct the corresponding ranges
+ranges = [];
+idx = 1;
+
+for i = 1:length(dists)
+    d = dists[i]
+    push!(ranges, idx:idx + length(d) - 1)
+
+    global idx
+    idx += length(d)
+end;
+
+ranges
+
+# Base distribution; mean-field normal
+num_params = ranges[end][end]
+
+d = MvNormal(zeros(num_params), ones(num_params));
+
+# Construct the transform
+bs = bijector.(dists);     # constrained-to-unconstrained bijectors for dists
+ibs = inverse.(bs);            # invert, so we get unconstrained-to-constrained
+sb = Stacked(ibs, ranges) # => Stacked <: Bijector
+
+# Mean-field normal with unconstrained-to-constrained stacked bijector
+td = transformed(d, sb);
+y = rand(td)
+0.0 ≤ y[1] ≤ 1.0
+0.0 < y[2]
+sum(y[3:4]) ≈ 1.0
+```
+
+## Normalizing flows
+A very interesting application is that of _normalizing flows_.[1] Usually this is done by sampling from a multivariate normal distribution, and then transforming this to a target distribution using invertible neural networks. Currently there are two such transforms available in Bijectors.jl: `PlanarLayer` and `RadialLayer`. Let's create a flow with a single `PlanarLayer`:
+
+```@setup normalizing-flows
+using Bijectors
+using StableRNGs: StableRNG
+rng = StableRNG(42);
+```
+
+```@repl normalizing-flows
+d = MvNormal(zeros(2), ones(2));
+b = PlanarLayer(2)
+flow = transformed(d, b)
+flow isa MultivariateDistribution
+```
+
+That's it. Now we can sample from it using `rand` and compute the `logpdf`, like any other `Distribution`.
+
+```@repl normalizing-flows
+y = rand(rng, flow)
+logpdf(flow, y)         # uses inverse of `b`
+```
+
+Similarily to the multivariate ADVI example, we could use `Stacked` to get a _bounded_ flow:
+
+```@repl normalizing-flows
+d = MvNormal(zeros(2), ones(2));
+ibs = inverse.(bijector.((InverseGamma(2, 3), Beta())));
+sb = stack(ibs...) # == Stacked(ibs) == Stacked(ibs, [i:i for i = 1:length(ibs)]
+b = sb ∘ PlanarLayer(2)
+td = transformed(d, b);
+y = rand(rng, td)
+0 < y[1]
+0 ≤ y[2] ≤ 1
+```
+
+Want to fit the flow?
+
+```@repl normalizing-flows
+using Zygote
+
+# Construct the flow.
+b = PlanarLayer(2)
+
+# Convenient for extracting parameters and reconstructing the flow.
+using Functors
+θs, reconstruct = Functors.functor(b);
+
+# Make the objective a `struct` to avoid capturing global variables.
+struct NLLObjective{R,D,T}
+    reconstruct::R
+    basedist::D
+    data::T
+end
+
+function (obj::NLLObjective)(θs...)
+    transformed_dist = transformed(obj.basedist, obj.reconstruct(θs))
+    return -sum(Base.Fix1(logpdf, transformed_dist), eachcol(obj.data))
+end
+
+# Some random data to estimate the density of.
+xs = randn(2, 1000);
+
+# Construct the objective.
+f = NLLObjective(reconstruct, MvNormal(2, 1), xs);
+
+# Initial loss.
+@info "Initial loss: $(f(θs...))"
+
+# Train using gradient descent.
+ε = 1e-3;
+for i = 1:100
+    ∇s = Zygote.gradient(f, θs...)
+    θs = map(θs, ∇s) do θ, ∇
+        θ - ε .* ∇
+    end
+end
+
+# Final loss
+@info "Finall loss: $(f(θs...))"
+
+# Very simple check to see if we learned something useful.
+samples = rand(transformed(f.basedist, f.reconstruct(θs)), 1000);
+mean(eachcol(samples)) # ≈ [0, 0]
+cov(samples; dims=2)   # ≈ I
+```
+
+We can easily create more complex flows by simply doing `PlanarLayer(10) ∘ PlanarLayer(10) ∘ RadialLayer(10)` and so on.
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -0,0 +1,32 @@
+# Bijectors.jl
+
+This package implements a set of functions for transforming constrained random variables (e.g. simplexes, intervals) to Euclidean space. The 3 main functions implemented in this package are the `link`, `invlink` and `logpdf_with_trans` for a number of distributions. The distributions supported are:
+1. `RealDistribution`: `Union{Cauchy, Gumbel, Laplace, Logistic, NoncentralT, Normal, NormalCanon, TDist}`,
+2. `PositiveDistribution`: `Union{BetaPrime, Chi, Chisq, Erlang, Exponential, FDist, Frechet, Gamma, InverseGamma, InverseGaussian, Kolmogorov, LogNormal, NoncentralChisq, NoncentralF, Rayleigh, Weibull}`,
+3. `UnitDistribution`: `Union{Beta, KSOneSided, NoncentralBeta}`,
+4. `SimplexDistribution`: `Union{Dirichlet}`,
+5. `PDMatDistribution`: `Union{InverseWishart, Wishart}`, and
+6. `TransformDistribution`: `Union{T, Truncated{T}} where T<:ContinuousUnivariateDistribution`.
+
+All exported names from the [Distributions.jl](https://github.com/TuringLang/Bijectors.jl) package are reexported from `Bijectors`.
+
+Bijectors.jl also provides a nice interface for working with these maps: composition, inversion, etc.
+The following table lists mathematical operations for a bijector and the corresponding code in Bijectors.jl.
+
+| Operation                          | Method          | Automatic |
+|:------------------------------------:|:-----------------:|:-----------:|
+| `b ↦ b⁻¹`                                      | `inverse(b)`                | ✓         |
+| `(b₁, b₂) ↦ (b₁ ∘ b₂)`                         | `b₁ ∘ b₂`               | ✓         |
+| `(b₁, b₂) ↦ [b₁, b₂]`                          | `stack(b₁, b₂)`         | ✓         |
+| `x ↦ b(x)`                                     | `b(x)`                  | ×         |
+| `y ↦ b⁻¹(y)`                                   | `inverse(b)(y)`             | ×         |
+| `x ↦ log｜det J(b, x)｜`                       | `logabsdetjac(b, x)`    | AD        |
+| `x ↦ b(x), log｜det J(b, x)｜`                 | `with_logabsdet_jacobian(b, x)`         | ✓         |
+| `p ↦ q := b_* p`                                | `q = transformed(p, b)` | ✓         |
+| `y ∼ q`                                        | `y = rand(q)`           | ✓         |
+| `p ↦ b` such that `support(b_* p) = ℝᵈ`               | `bijector(p)`           | ✓         |
+| `(x ∼ p, b(x), log｜det J(b, x)｜, log q(y))` | `forward(q)`            | ✓         |
+
+In this table, `b` denotes a `Bijector`, `J(b, x)` denotes the Jacobian of `b` evaluated at `x`, `b_*` denotes the [push-forward](https://www.wikiwand.com/en/Pushforward_measure) of `p` by `b`, and `x ∼ p` denotes `x` sampled from the distribution with density `p`.
+
+The "Automatic" column in the table refers to whether or not you are required to implement the feature for a custom `Bijector`. "AD" refers to the fact that it can be implemented "automatically" using automatic differentiation.