-
Notifications
You must be signed in to change notification settings - Fork 15
Request for documentation of the UnitVector transformation #86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
It is explained in the Stan manual. |
Dear @tpapp, the closest I get in the stan manual is the chapter on Unit Vectors but I don't understand how that explains the implementation of
The specific issue i struggle to understand is that the domain of the
I'm trying to handle angles in an inference problem as described in the stan manual https://mc-stan.org/docs/2_18/stan-users-guide/unit-vectors-and-rotations.html, i.e. I was hoping to do something like t = UnitVector(2)
cos_θ, sin_θ = transform(t, [1.234])
θ = atan(sin_θ, cos_θ) But because of the half-plane issue, the range of θ is [0,π] and not [-π,π]. Is this intended behavior? If so, do you have any suggestions on how to handle angles? best |
Sorry for closing it too hastily, and thanks for persisting, I can replicate the bug (I think the range in 2d is I think that a constant is off in the calculations, and we should map from (Incidentally, I think Stan just uses Marsaglia's method, with an extra df, so it is not much help if we want a bijection). |
This is actually a dup of #66, but not closing either in favor of another; I will think about a solution and close them at the same time. |
I did some reading about this and doing it "uniformly" seems to be a hard problem. However, that is not needed for out purposes, we merely need a bijection. That said, it having nice numerical properties is useful. #67 is what Stan uses, but it is not a bijection. I will test out the quick fix I mention above, and if that does not work try spherical coordinates. |
Strictly speaking, one does not need a bijection. All one needs is to draw samples in an unconstrained latent space with a transformation to constrained space and a log-density correction so that the resulting transformed samples target the correct distribution. For bijective functions, that log-density correction is a logabsdetjac (more generally, logdetsqrtmetric), but there are corrections for non-bijective transformations, which is what Stan uses here. The caveat is that if you have a non-bijective transformation, then you can only define a right-inverse, so the latent unconstrained space must be the ground truth. i.e. instead of mapping from There are ample other cases where it makes sense to have non-bijective transformations. e.g. a user wants to sample a point in a disk. One way to do this is to sample a point on a sphere, with a non-bijective projection that discards one of the axes. The resulting distribution is non-uniform on the disk, so there's a log-density correction that makes it uniform.
There is no chart on the sphere that completely covers it. Every chart has singularities, and if the typical set is localized near a singularity, this will cause divergences. This is, I believe, why Stan chooses a non-bijective transformation here, because the geometry then has no singularities and is well-behaved. It's actually least well-behaved I think for low-dimensional vectors, where in the latent space one can move a short distance away from the origin and suddenly a different step size is needed to step the same distance on the surface of the sphere. But due to concentration of measure, for a high-dimensional multivariate normal, the samples concentrate to the surface of a hypersphere anyways, so this parameterization actually produces a really nice geometry for sampling. |
@sethaxen, thanks for the clarification. So do I understand this correctly: when estimating a model with a (log) posterior I am OK to give up transformations being bijections in this package, but I want to understand it first, so suggestions for reading materials are welcome. In particular,
|
Sorry for the very late reply!
Yes! This is correct.
It's only an identification issue if the chosen
I don't think this is any more nonsensical for this This particular approach is I believe a direct consequence of the co-area formula in geometric measure theory, but I unfortunately haven't seen any very accessible explanations of it for this use. So here's a more intuitive explanation in terms of familiar operations. Suppose we have a density We know that discarding a coordinate in MCMC is equivalent to marginalizing out those coordinates in the target distribution. Similarly, we can augment our distribution. So let In practice, we end up using a map I have a few ideas for under what circumstances this approach is likely to be useful, but I've never seen a paper that discussed this approach in general terms. Now for a few examples The unit sphereLet In this particular case, if I've noticed with low-dimensional unit vectors, it's much more likely (due to curse of dimensionality) to get low The unit hemi-sphereLet the unit hemisphere be the unit sphere but with the constraint that EDIT: a much better approach is to first transform the first coordinate of |
@sethaxen, thanks for the detailed answer (sorry to see that MathJax is kind of broken now, hopefully it gets fixed). And sorry for the late reply, I am still digesting this. What I still do not understand is
ie where the |
Dear Tamas Papp,
Thanks for your efforts on TransformVariables.jl. Do you happen do know a source (paper/textbook/blog) which explains the transformation which underlies the UnitVector type?
best
Jon Eriksen
The text was updated successfully, but these errors were encountered: