Skip to content

Unexpected sample values from beta distribution for small parameters #999

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
saona-raimundo opened this issue Jul 16, 2020 · 7 comments · Fixed by #1000
Closed

Unexpected sample values from beta distribution for small parameters #999

saona-raimundo opened this issue Jul 16, 2020 · 7 comments · Fixed by #1000

Comments

@saona-raimundo
Copy link
Contributor

Background

Beta distribution is implemented through the Beta struct and samples should give a number between zero and one. It is known that this distribution is numerically delicate when dealing with both parameters (alpha and beta) small.

The implementation of the sample method is though the following characterization.
If X, Y are independent and X follows Gamma(alpha, theta) and Y follows Gamma(beta, theta), then X / (X + Y) follows Beta(alpha, beta).
For more such characterization, see here.

Sampling from a beta distribution with both alpha and beta parameters small returns NAN samples. This is clear from the implementation, but is not expected for the user at all!
By the way, values of 1.0e-3 are already small enough to easily get a NAN result. Just run the following code.

use rand::distributions::Distribution;
fn main() {
	let param = 1.0e-3;
	let beta = rand_distr::Beta::new(param, param).unwrap();
	for x in beta.sample_iter(rand::thread_rng()) {
		if (x as f64).is_nan() {
			println!("I got a NAN!!");
		}
	}
}

What is your motivation?
I as doing numerical simulations and need to simulation beta samples as part of a rejection sampling algorithm. Running into nan values was unexpected, but could solve the issue by a particular symmetry present in my problem.

What type of application is this? (E.g. cryptography, game, numerical simulation)
Numerical simulation.

Feature request

I would like to contribute to a more robust simulation method of the beta variable that takes into account such cases.

I don't have a particular idea in mind. I tried the [scipy module to simulate beta](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.beta.html) and it seemed more robust (it gave some numbers that made sense in the cases I tried).
@vks
Copy link
Collaborator

vks commented Jul 16, 2020

I think I would start by plotting the beta distribution for alpha = beta = 1e-3.

@vks
Copy link
Collaborator

vks commented Jul 16, 2020

This fixes the issue for me, can you confirm that it makes sense?

diff --git a/rand_distr/src/gamma.rs b/rand_distr/src/gamma.rs
index ba8e4e0eb3..907be37d8f 100644
--- a/rand_distr/src/gamma.rs
+++ b/rand_distr/src/gamma.rs
@@ -495,7 +495,11 @@ where
     fn sample<R: Rng + ?Sized>(&self, rng: &mut R) -> N {
         let x = self.gamma_a.sample(rng);
         let y = self.gamma_b.sample(rng);
-        x / (x + y)
+        if x == N::from(0.) {
+            N::from(0.)
+        } else {
+            x / (x + y)
+        }
     }
 }
 
@@ -566,6 +570,15 @@ mod test {
         Beta::new(0., 0.).unwrap();
     }
 
+    #[test]
+    fn test_beta_small_param() {
+        let beta = Beta::<f64>::new(1e-3, 1e-3).unwrap();
+        let mut rng = crate::test::rng(206);
+        for _ in 0..1000 {
+            assert!(!beta.sample(&mut rng).is_nan());
+        }
+    }
+
     #[test]
     fn value_stability() {
         fn test_samples<N: Float + core::fmt::Debug, D: Distribution<N>>(

@saona-raimundo
Copy link
Contributor Author

saona-raimundo commented Jul 16, 2020

Thank you very much for the quick answer!!

Sadly, changing the result for simply zero is not an option. Simulations will be concentrated in zero
image
This is 1000 samples with your change.
Since alpha = beta the density should be symmetric.

Note that the density is symmetric only in the case alpha = beta. In this case, when x and y are zero, one "should choose between zero and one at random"... This is a particular ad-hoc solution, but it is not a solution in general, when alpha, beta are small, but not necessarily equal.

@vks
Copy link
Collaborator

vks commented Jul 17, 2020

I'm not sure what you are plotting, but with such extreme parameters you should be getting either 0 or 1. Are you using some kind kernel density estimation?

@saona-raimundo
Copy link
Contributor Author

Yes, sorry, I am using a gaussian kernel, but the samples are the crosses in bottom: they are all 0 or 1, except for two sample between 0 and 0.2.

The point is that replacing nan samples by zero (effectively this is what you proposed) is biased.

Thinking about the simulation using gamma variables, the issue arises when both x and y are zero. Then, x / (x + y) should be... zero, or one?

@vks
Copy link
Collaborator

vks commented Jul 17, 2020

Switching to another algorithm (that also performs better for small parameters) gets rid of the x / (x + y) for x, y -> 0 limit, with a more manageable w / (w + b) and b / (w + b) for w -> oo limit. See #1000.

@saona-raimundo
Copy link
Contributor Author

Wow!! Great!!

I confirm that this fixes my issues!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants