Unexpected sample values from beta distribution for small parameters #999

saona-raimundo · 2020-07-16T18:03:21Z

Background

Beta distribution is implemented through the Beta struct and samples should give a number between zero and one. It is known that this distribution is numerically delicate when dealing with both parameters (alpha and beta) small.

The implementation of the sample method is though the following characterization.
If X, Y are independent and X follows Gamma(alpha, theta) and Y follows Gamma(beta, theta), then X / (X + Y) follows Beta(alpha, beta).
For more such characterization, see here.

Sampling from a beta distribution with both alpha and beta parameters small returns NAN samples. This is clear from the implementation, but is not expected for the user at all!
By the way, values of 1.0e-3 are already small enough to easily get a NAN result. Just run the following code.

use rand::distributions::Distribution;
fn main() {
	let param = 1.0e-3;
	let beta = rand_distr::Beta::new(param, param).unwrap();
	for x in beta.sample_iter(rand::thread_rng()) {
		if (x as f64).is_nan() {
			println!("I got a NAN!!");
		}
	}
}

What is your motivation?
I as doing numerical simulations and need to simulation beta samples as part of a rejection sampling algorithm. Running into nan values was unexpected, but could solve the issue by a particular symmetry present in my problem.

What type of application is this? (E.g. cryptography, game, numerical simulation)
Numerical simulation.

Feature request

I would like to contribute to a more robust simulation method of the beta variable that takes into account such cases.

I don't have a particular idea in mind. I tried the [scipy module to simulate beta](https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.beta.html) and it seemed more robust (it gave some numbers that made sense in the cases I tried).

The text was updated successfully, but these errors were encountered:

vks · 2020-07-16T18:22:29Z

I think I would start by plotting the beta distribution for alpha = beta = 1e-3.

vks · 2020-07-16T18:40:36Z

This fixes the issue for me, can you confirm that it makes sense?

diff --git a/rand_distr/src/gamma.rs b/rand_distr/src/gamma.rs
index ba8e4e0eb3..907be37d8f 100644
--- a/rand_distr/src/gamma.rs
+++ b/rand_distr/src/gamma.rs
@@ -495,7 +495,11 @@ where
     fn sample<R: Rng + ?Sized>(&self, rng: &mut R) -> N {
         let x = self.gamma_a.sample(rng);
         let y = self.gamma_b.sample(rng);
-        x / (x + y)
+        if x == N::from(0.) {
+            N::from(0.)
+        } else {
+            x / (x + y)
+        }
     }
 }
 
@@ -566,6 +570,15 @@ mod test {
         Beta::new(0., 0.).unwrap();
     }
 
+    #[test]
+    fn test_beta_small_param() {
+        let beta = Beta::<f64>::new(1e-3, 1e-3).unwrap();
+        let mut rng = crate::test::rng(206);
+        for _ in 0..1000 {
+            assert!(!beta.sample(&mut rng).is_nan());
+        }
+    }
+
     #[test]
     fn value_stability() {
         fn test_samples<N: Float + core::fmt::Debug, D: Distribution<N>>(

saona-raimundo · 2020-07-16T21:10:26Z

Thank you very much for the quick answer!!

Sadly, changing the result for simply zero is not an option. Simulations will be concentrated in zero

This is 1000 samples with your change.
Since alpha = beta the density should be symmetric.

Note that the density is symmetric only in the case alpha = beta. In this case, when x and y are zero, one "should choose between zero and one at random"... This is a particular ad-hoc solution, but it is not a solution in general, when alpha, beta are small, but not necessarily equal.

vks · 2020-07-17T14:59:40Z

I'm not sure what you are plotting, but with such extreme parameters you should be getting either 0 or 1. Are you using some kind kernel density estimation?

saona-raimundo · 2020-07-17T15:05:13Z

Yes, sorry, I am using a gaussian kernel, but the samples are the crosses in bottom: they are all 0 or 1, except for two sample between 0 and 0.2.

The point is that replacing nan samples by zero (effectively this is what you proposed) is biased.

Thinking about the simulation using gamma variables, the issue arises when both x and y are zero. Then, x / (x + y) should be... zero, or one?

vks · 2020-07-17T15:49:05Z

Switching to another algorithm (that also performs better for small parameters) gets rid of the x / (x + y) for x, y -> 0 limit, with a more manageable w / (w + b) and b / (w + b) for w -> oo limit. See #1000.

saona-raimundo · 2020-07-17T17:27:04Z

Wow!! Great!!

I confirm that this fixes my issues!

vks mentioned this issue Jul 17, 2020

Improve algorithm for sampling Beta #1000

Merged

saona-raimundo closed this as completed Jul 17, 2020

saona-raimundo mentioned this issue Aug 4, 2020

Beta Distribution values wrong for a=b---> 0 pytorch/pytorch#15738

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected sample values from beta distribution for small parameters #999

Unexpected sample values from beta distribution for small parameters #999

saona-raimundo commented Jul 16, 2020

vks commented Jul 16, 2020

vks commented Jul 16, 2020

saona-raimundo commented Jul 16, 2020 •

edited

Loading

vks commented Jul 17, 2020

saona-raimundo commented Jul 17, 2020

vks commented Jul 17, 2020

saona-raimundo commented Jul 17, 2020

Unexpected sample values from beta distribution for small parameters #999

Unexpected sample values from beta distribution for small parameters #999

Comments

saona-raimundo commented Jul 16, 2020

Background

Feature request

vks commented Jul 16, 2020

vks commented Jul 16, 2020

saona-raimundo commented Jul 16, 2020 • edited Loading

vks commented Jul 17, 2020

saona-raimundo commented Jul 17, 2020

vks commented Jul 17, 2020

saona-raimundo commented Jul 17, 2020

saona-raimundo commented Jul 16, 2020 •

edited

Loading