Skip to content

No-op code & docs cleanups #245

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: devel
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 11 additions & 23 deletions docs/src/tutorials/backends/nlopt.md
Original file line number Diff line number Diff line change
@@ -1,40 +1,28 @@
# Using NLopt.jl

[`SemOptimizerNLopt`](@ref) implements the connection to `NLopt.jl`.
It is only available if the `NLopt` package is loaded alongside `StructuralEquationModel.jl` in the running Julia session.
It takes a bunch of arguments:
When [`NLopt.jl`](https://github.com/jump-dev/NLopt.jl) is loaded in the running Julia session,
it could be used by the [`SemOptimizer`](@ref) by specifying `engine = :NLopt`
(see [NLopt-specific options](@ref `SemOptimizerNLopt`)).
Among other things, `NLopt` enables constrained optimization of the SEM models, which is
explained in the [Constrained optimization](@ref) section.

```julia
• algorithm: optimization algorithm

• options::Dict{Symbol, Any}: options for the optimization algorithm

• local_algorithm: local optimization algorithm

• local_options::Dict{Symbol, Any}: options for the local optimization algorithm

• equality_constraints::Vector{NLoptConstraint}: vector of equality constraints

• inequality_constraints::Vector{NLoptConstraint}: vector of inequality constraints
```
Constraints are explained in the section on [Constrained optimization](@ref).

The defaults are LBFGS as the optimization algorithm and the standard options from `NLopt.jl`.
We can choose something different:
We can override the default *NLopt* algorithm (LFBGS) and instead use
the *augmented lagrangian* method with LBFGS as the *local* optimization algorithm,
stop at a maximum of 200 evaluations and use a relative tolerance of
the objective value of `1e-6` as the stopping criterion for the local algorithm:

```julia
using NLopt

my_optimizer = SemOptimizerNLopt(;
my_optimizer = SemOptimizer(;
engine = :NLopt,
algorithm = :AUGLAG,
options = Dict(:maxeval => 200),
local_algorithm = :LD_LBFGS,
local_options = Dict(:ftol_rel => 1e-6)
)
```

This uses an augmented lagrangian method with LBFGS as the local optimization algorithm, stops at a maximum of 200 evaluations and uses a relative tolerance of the objective value of `1e-6` as the stopping criterion for the local algorithm.

To see how to use the optimizer to actually fit a model now, check out the [Model fitting](@ref) section.

In the NLopt docs, you can find explanations about the different [algorithms](https://nlopt.readthedocs.io/en/latest/NLopt_Algorithms/) and a [tutorial](https://nlopt.readthedocs.io/en/latest/NLopt_Introduction/) that also explains the different options.
Expand Down
20 changes: 10 additions & 10 deletions docs/src/tutorials/backends/optim.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,23 @@
# Using Optim.jl

[`SemOptimizerOptim`](@ref) implements the connection to `Optim.jl`.
It takes two arguments, `algorithm` and `options`.
The defaults are LBFGS as the optimization algorithm and the standard options from `Optim.jl`.
We can load the `Optim` and `LineSearches` packages to choose something different:
[Optim.jl](https://github.com/JuliaNLSolvers/Optim.jl) is the default optimization engine of *SEM.jl*,
see [`SemOptimizerOptim`](@ref) for a full list of its parameters.
It defaults to the LBFGS optimization, but we can load the `Optim` and `LineSearches` packages
and specify BFGS (!not L-BFGS) with a back-tracking linesearch and Hager-Zhang initial step length guess:

```julia
using Optim, LineSearches

my_optimizer = SemOptimizerOptim(
my_optimizer = SemOptimizer(
algorithm = BFGS(
linesearch = BackTracking(order=3),
linesearch = BackTracking(order=3),
alphaguess = InitialHagerZhang()
),
options = Optim.Options(show_trace = true)
)
),
options = Optim.Options(show_trace = true)
)
```

This optimizer will use BFGS (!not L-BFGS) with a back tracking linesearch and a certain initial step length guess. Also, the trace of the optimization will be printed to the console.
Note that we used `options` to print the optimization progress to the console.

To see how to use the optimizer to actually fit a model now, check out the [Model fitting](@ref) section.

Expand Down
4 changes: 2 additions & 2 deletions docs/src/tutorials/concept.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,8 @@ Available loss functions are
- [`SemRidge`](@ref): ridge regularization

## The optimizer part aka `SemOptimizer`
The optimizer part of a model connects to the numerical optimization backend used to fit the model.
It can be used to control options like the optimization algorithm, linesearch, stopping criteria, etc.
The optimizer part of a model connects to the numerical optimization backend used to fit the model.
It can be used to control options like the optimization algorithm, linesearch, stopping criteria, etc.
There are currently three available backends, [`SemOptimizerOptim`](@ref) connecting to the [Optim.jl](https://github.com/JuliaNLSolvers/Optim.jl) backend, [`SemOptimizerNLopt`](@ref) connecting to the [NLopt.jl](https://github.com/JuliaOpt/NLopt.jl) backend and [`SemOptimizerProximal`](@ref) connecting to [ProximalAlgorithms.jl](https://github.com/JuliaFirstOrder/ProximalAlgorithms.jl).
For more information about the available options see also the tutorials about [Using Optim.jl](@ref) and [Using NLopt.jl](@ref), as well as [Constrained optimization](@ref) and [Regularization](@ref) .

Expand Down
45 changes: 22 additions & 23 deletions docs/src/tutorials/constraints/constraints.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Constrained optimization

## Using the NLopt backend
## Using the NLopt engine

### Define an example model

Expand Down Expand Up @@ -38,7 +38,7 @@ end

partable = ParameterTable(
graph,
latent_vars = latent_vars,
latent_vars = latent_vars,
observed_vars = observed_vars)

data = example_data("political_democracy")
Expand All @@ -64,7 +64,8 @@ Let's introduce some constraints:

(Of course those constaints only serve an illustratory purpose.)

We first need to get the indices of the respective parameters that are invoved in the constraints.
To fit the SEM model with the functional constraints, we will use the *NLopt* optimization engine.
Since *NLopt* does not have access to the SEM parameter names, we have to lookup the indices of the respective parameters that are invoved in the constraints.
We can look up their labels in the output above, and retrieve their indices as

```@example constraints
Expand Down Expand Up @@ -112,48 +113,46 @@ end
```

If the algorithm needs gradients at an iteration, it will pass the vector `gradient` that is of the same size as the parameters.
With `if length(gradient) > 0` we check if the algorithm needs gradients, and if it does, we fill the `gradient` vector with the gradients
With `if length(gradient) > 0` we check if the algorithm needs gradients, and if it does, we fill the `gradient` vector with the gradients
of the constraint w.r.t. the parameters.

In NLopt, vector-valued constraints are also possible, but we refer to the documentation for that.
In *NLopt*, vector-valued constraints are also possible, but we refer to the documentation for that.

### Fit the model

We now have everything together to specify and fit our model. First, we specify our optimizer backend as
Now we can construct the *SemOptimizer* that will use the *NLopt* engine for constrained optimization.

```@example constraints
using NLopt

constrained_optimizer = SemOptimizerNLopt(
constrained_optimizer = SemOptimizer(
engine = :NLopt,
algorithm = :AUGLAG,
options = Dict(:upper_bounds => upper_bounds, :xtol_abs => 1e-4),
local_algorithm = :LD_LBFGS,
equality_constraints = NLoptConstraint(;f = eq_constraint, tol = 1e-8),
inequality_constraints = NLoptConstraint(;f = ineq_constraint, tol = 1e-8),
equality_constraints = (eq_constraint => 1e-8),
inequality_constraints = (ineq_constraint => 1e-8),
)
```

As you see, the equality constraints and inequality constraints are passed as keyword arguments, and the bounds are passed as options for the (outer) optimization algorithm.
Additionally, for equality and inequality constraints, a feasibility tolerance can be specified that controls if a solution can be accepted, even if it violates the constraints by a small amount.
As you see, the equality and inequality constraints are passed as keyword arguments, and the bounds are passed as options for the (outer) optimization algorithm.
Additionally, for equality and inequality constraints, a feasibility tolerance can be specified that controls if a solution can be accepted, even if it violates the constraints by a small amount.
Especially for equality constraints, it is recommended to allow for a small positive tolerance.
In this example, we set both tolerances to `1e-8`.

!!! warning "Convergence criteria"
We have often observed that the default convergence criteria in NLopt lead to non-convergence flags.
Indeed, this example does not convergence with default criteria.
As you see above, we used a realively liberal absolute tolerance in the optimization parameters of 1e-4.
This should not be a problem in most cases, as the sampling variance in (almost all) structural equation models
As you see above, we used a relatively liberal absolute tolerance in the optimization parameters of 1e-4.
This should not be a problem in most cases, as the sampling variance in (almost all) structural equation models
should lead to uncertainty in the parameter estimates that are orders of magnitude larger.
We nontheless recommend choosing a convergence criterion with care (i.e. w.r.t. the scale of your parameters),
inspecting the solutions for plausibility, and comparing them to unconstrained solutions.

```@example constraints
model_constrained = Sem(
specification = partable,
data = data
)
We now have everything to fit our model under constraints:

model_fit_constrained = fit(constrained_optimizer, model_constrained)
```@example constraints
model_fit_constrained = fit(constrained_optimizer, model)
```

As you can see, the optimizer converged (`:XTOL_REACHED`) and investigating the solution yields
Expand All @@ -162,14 +161,14 @@ As you can see, the optimizer converged (`:XTOL_REACHED`) and investigating the
update_partable!(
partable,
:estimate_constr,
param_labels(model_fit_constrained),
solution(model_fit_constrained),
)
param_labels(model_fit_constrained),
solution(model_fit_constrained),
)

details(partable)
```

As we can see, the constrained solution is very close to the original solution (compare the columns estimate and estimate_constr), with the difference that the constrained parameters fulfill their constraints.
As we can see, the constrained solution is very close to the original solution (compare the columns estimate and estimate_constr), with the difference that the constrained parameters fulfill their constraints.
As all parameters are estimated simultaneously, it is expexted that some unconstrained parameters are also affected (e.g., the constraint on `dem60 → y2` leads to a higher estimate of the residual variance `y2 ↔ y2`).

## Using the Optim.jl backend
Expand Down
2 changes: 1 addition & 1 deletion docs/src/tutorials/construction/build_by_parts.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ end

partable = ParameterTable(
graph,
latent_vars = lat_vars,
latent_vars = lat_vars,
observed_vars = obs_vars)
```

Expand Down
20 changes: 10 additions & 10 deletions docs/src/tutorials/fitting/fitting.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,19 +7,19 @@ model_fit = fit(model)

# output

Fitted Structural Equation Model
===============================================
--------------------- Model -------------------
Fitted Structural Equation Model
===============================================
--------------------- Model -------------------

Structural Equation Model
- Loss Functions
Structural Equation Model
- Loss Functions
SemML
- Fields
observed: SemObservedData
implied: RAM
optimizer: SemOptimizerOptim
- Fields
observed: SemObservedData
implied: RAM
optimizer: SemOptimizerOptim

------------- Optimization result -------------
------------- Optimization result -------------

* Status: success

Expand Down
4 changes: 2 additions & 2 deletions docs/src/tutorials/meanstructure.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ end

partable = ParameterTable(
graph,
latent_vars = latent_vars,
latent_vars = latent_vars,
observed_vars = observed_vars)
```

Expand Down Expand Up @@ -78,7 +78,7 @@ end

partable = ParameterTable(
graph,
latent_vars = latent_vars,
latent_vars = latent_vars,
observed_vars = observed_vars)
```

Expand Down
55 changes: 22 additions & 33 deletions docs/src/tutorials/regularization/regularization.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,13 @@

## Setup

For ridge regularization, you can simply use `SemRidge` as an additional loss function
For ridge regularization, you can simply use `SemRidge` as an additional loss function
(for example, a model with the loss functions `SemML` and `SemRidge` corresponds to ridge-regularized maximum likelihood estimation).

For lasso, elastic net and (far) beyond, you can load the `ProximalAlgorithms.jl` and `ProximalOperators.jl` packages alongside `StructuralEquationModels`:
For lasso, elastic net and (far) beyond, you can use the [`ProximalOperators.jl`](https://github.com/JuliaFirstOrder/ProximalOperators.jl)
and optimize the model with [`ProximalAlgorithms.jl`](https://github.com/JuliaFirstOrder/ProximalAlgorithms.jl)
that provides so-called *proximal optimization* algorithms.
It can handle, amongst other things, various forms of regularization.

```@setup reg
using StructuralEquationModels, ProximalAlgorithms, ProximalOperators
Expand All @@ -19,25 +22,23 @@ Pkg.add("ProximalOperators")
using StructuralEquationModels, ProximalAlgorithms, ProximalOperators
```

## `SemOptimizerProximal`
## Proximal optimization

To estimate regularized models, we provide a "building block" for the optimizer part, called `SemOptimizerProximal`.
It connects our package to the [`ProximalAlgorithms.jl`](https://github.com/JuliaFirstOrder/ProximalAlgorithms.jl) optimization backend, providing so-called proximal optimization algorithms.
Those can handle, amongst other things, various forms of regularization.

It can be used as
With *ProximalAlgorithms* package loaded, it is now possible to use `:Proximal` optimization engine
in `SemOptimizer` for estimating regularized models.

```julia
SemOptimizerProximal(
SemOptimizer(;
engine = :Proximal,
algorithm = ProximalAlgorithms.PANOC(),
options = Dict{Symbol, Any}(),
operator_g,
operator_h = nothing
)
)
```

The proximal operator (aka the regularization function) can be passed as `operator_g`, available options are listed [here](https://juliafirstorder.github.io/ProximalOperators.jl/stable/functions/).
The available Algorithms are listed [here](https://juliafirstorder.github.io/ProximalAlgorithms.jl/stable/guide/implemented_algorithms/).
The available algorithms are listed [here](https://juliafirstorder.github.io/ProximalAlgorithms.jl/stable/guide/implemented_algorithms/).

## First example - lasso

Expand Down Expand Up @@ -70,7 +71,7 @@ end

partable = ParameterTable(
graph,
latent_vars = latent_vars,
latent_vars = latent_vars,
observed_vars = observed_vars
)

Expand All @@ -86,7 +87,7 @@ We labeled the covariances between the items because we want to regularize those

```@example reg
ind = getindex.(
[param_indices(model)],
Ref(param_indices(model)),
[:cov_15, :cov_24, :cov_26, :cov_37, :cov_48, :cov_68])
```

Expand All @@ -101,26 +102,18 @@ From the previously linked [documentation](https://juliafirstorder.github.io/Pro

```@example reg
λ = zeros(31); λ[ind] .= 0.02
```

and use `SemOptimizerProximal`.

```@example reg
optimizer_lasso = SemOptimizerProximal(
optimizer_lasso = SemOptimizer(
engine = :Proximal,
operator_g = NormL1(λ)
)

model_lasso = Sem(
specification = partable,
data = data
)
```

Let's fit the regularized model

```@example reg

fit_lasso = fit(optimizer_lasso, model_lasso)
fit_lasso = fit(optimizer_lasso, model)
```

and compare the solution to unregularizted estimates:
Expand All @@ -135,7 +128,8 @@ update_partable!(partable, :estimate_lasso, param_labels(fit_lasso), solution(fi
details(partable)
```

Instead of explicitely defining a `SemOptimizerProximal` object, you can also pass `engine = :Proximal` and additional keyword arguments to `fit`:
Instead of explicitly defining a `SemOptimizer` object, you can also pass `engine = :Proximal`
and additional keyword arguments directly to the `fit` function:

```@example reg
fit = fit(model; engine = :Proximal, operator_g = NormL1(λ))
Expand All @@ -144,25 +138,20 @@ fit = fit(model; engine = :Proximal, operator_g = NormL1(λ))
## Second example - mixed l1 and l0 regularization

You can choose to penalize different parameters with different types of regularization functions.
Let's use the lasso again on the covariances, but additionally penalyze the error variances of the observed items via l0 regularization.
Let's use the lasso again on the covariances, but additionally penalize the error variances of the observed items via l0 regularization.

The l0 penalty is defined as
```math
\lambda \mathrm{nnz}(\theta)
```

To define a sup of separable proximal operators (i.e. no parameter is penalized twice),
To define a sum of separable proximal operators (i.e. no parameter is penalized twice),
we can use [`SlicedSeparableSum`](https://juliafirstorder.github.io/ProximalOperators.jl/stable/calculus/#ProximalOperators.SlicedSeparableSum) from the `ProximalOperators` package:

```@example reg
prox_operator = SlicedSeparableSum((NormL0(20.0), NormL1(0.02), NormL0(0.0)), ([ind], [9:11], [vcat(1:8, 12:25)]))

model_mixed = Sem(
specification = partable,
data = data,
)

fit_mixed = fit(model_mixed; engine = :Proximal, operator_g = prox_operator)
fit_mixed = fit(model; engine = :Proximal, operator_g = prox_operator)
```

Let's again compare the different results:
Expand Down
Loading
Loading