From 58e86c4064b4e143654aee50cb659d8ac061effe Mon Sep 17 00:00:00 2001 From: Diggory Hardy Date: Tue, 1 Aug 2017 14:58:45 +0100 Subject: [PATCH 01/18] Add preliminary rand-create-redesign RFC --- text/0000-rand-crate-redesign.md | 363 +++++++++++++++++++++++++++++++ 1 file changed, 363 insertions(+) create mode 100644 text/0000-rand-crate-redesign.md diff --git a/text/0000-rand-crate-redesign.md b/text/0000-rand-crate-redesign.md new file mode 100644 index 00000000000..b19246da443 --- /dev/null +++ b/text/0000-rand-crate-redesign.md @@ -0,0 +1,363 @@ +- Feature Name: rand crate redesign +- Start Date: 2017-08-01 +- RFC PR: (leave this empty) +- Rust Issue: (leave this empty) + +# Summary +[summary]: #summary + +Evaluate options for the future of `rand` regarding both cryptographic and +non-cryptographic uses. + +See also: + +* [Crate evaluation thread] +* [Strawman design PR] +* [Strawman design doc] + +# Motivation +[motivation]: #motivation + +The [crate evaluation thread] brought up the issue of stabilisation of the `rand` +crate, however there appears to be widespread dissatisfaction with the current +design. This RFC looks at a number of ways that this crate can be improved. + +The most fundamental question still to answer is whether a one-size-fits-all +approach to random number generation is sufficient (*good enough*) or whether +splitting the crate into two is the better option: one focussed on cryptography, +the other on numerical applications. + +# Guide-level explanation +[guide-level-explanation]: #guide-level-explanation + +Since this concerns one (or more) crates outside the standard library, it is +assumed that these crates should be self-documenting. Much of this documentation +still needs writing, but must wait on design decisions. + +# Reference-level explanation +[reference-level-explanation]: #reference-level-explanation + +## Generation API + +This section concerns the `Rng` trait, but not specifically implementations or +generation of values of other types. + +Aside: one proposal is to rename `Rng` to `Generator`. + +The `Rng` trait governs what types can be generated directly, and there appears +to be consensus on removing all convenience functions not concerned with +value generation. Doing so would leave: + +``` +trait Rng { + fn next_u32(&mut self) -> u32 + fn next_u64(&mut self) -> u64 + + fn next_f32(&mut self) -> f32 + fn next_f64(&mut self) -> f64 + + fn fill_bytes(&mut self, dest: &mut [u8]) +} +``` + +Further, although [direct generation of floating-point random values is +possible](http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/#dSFMT), it is +proposed to remove `next_f32` and `next_f64`. This simplifies the API and removes +non-trivial integer to float conversions from the trait, but is not truely +necessary. + +Next, it has been suggested that `next_u8`, `next_u16` and (where supported) +`next_u128` should be added. That gives: + +``` +trait Rng { + fn next_u8(&mut self) -> u8 + fn next_u16(&mut self) -> u16 + fn next_u32(&mut self) -> u32 + fn next_u64(&mut self) -> u64 + fn next_u128(&mut self) -> u128 + + fn fill_bytes(&mut self, dest: &mut [u8]) +} +``` + +For crypto purposes, it has been suggested that `fill_bytes` alone would be +sufficient. For non-crypto purposes, the other methods (at least 32-bit and +64-bit variants) are desirable for performance, since many RNGs natively +produce `u32` or `u64` values. + +Further to this, there is discussion on whether these methods should all return +a `Result`. Apparently, some crypto RNGs can estimate available entropy and +detect cycles. A properly seeded cryptographic generator should be able to +produce a very long sequence of strong cryptographic numbers, but without +sufficient entropy for initialisation the generated numbers could be guessable. +Further, on many platforms, `OsRng` could fail (presumably due to the same +initialisation problem); the current implementation can panic. +Some people therefore advocate changing the above +functions to each return a `Result`. + +From the point of view of non-cryptographic numeric random number generation, +RNGs are usually fast, deterministic functions which have no means to detect +errors. Some may be able to detect lack of initialisation, but some implementations +always initialise with a fixed seed if no custom seed is provided. These may cycle, +likely will be unable to detect cycles (note that returning the same value twice +does not imply a cycle). Thus for non-crypto usage, returning a `Result` is +unnecessary, would require extra error handling code or `unwrap`s, and ahs some +performance penalty. (Note that all distributions would probably need to return +a `Result` too.) + +There is therefore a conflict of design here; [Brian Smith suggests separate +crypto and non-crypto APIs](https://internals.rust-lang.org/t/crate-evaluation-for-2017-07-25-rand/5505/59?u=dhardy) +(and presumably crates). This would allow a minimal crypto trait with a single +`fill_bytes(..) -> Result<..>` method, without impacting performance or +correctness (`unwrap`) of non-crypto code. + +My personal feeling is that +[relying on the OS](https://internals.rust-lang.org/t/crate-evaluation-for-2017-07-25-rand/5505/37) +where there are strong crypto requirements is the best choice, and that where a +user-space crypto RNG is required, the best design would be something like as +follows (but I have little experience with cryptography, and this does not allow +error on cycle detection): + +``` +let mut seed_buf = [u8; LEN]; +rand::try_fill_from_os(&mut buf)?; // this may fail +let mut rng = SomeRng::from_seed(seed_buf); +// after creation, rng can be assumed not to fail +``` + +Besides the issue of `next_u32` etc. and `fill_bytes` potentially failing, +another advantage of separate crypto and numeric `rand` crates would be absolute +simplicity of the crypto API and crate. Presumably in this case the numeric +crate would still depend on the crypto crate for correct initialisation. + +Further, should the `Rng` trait allow entropy injection and estimation of +available entropy? Obviously many RNGs won't be able to do the latter. +Entropy injection might be a viable alternative to periodic reseeding. + +## Generators + +This section concerns implementations of `Rng`. + +`OsRng` currently implements `Rng` by directly sampling from whatever OS +functionality is available. It might be preferable to implement a +platform-specific `try_fill_from_os(buf: &mut [u8]) -> Result<()>` function, +and (possibly) implement `OsRng` as a wrapper around this. +This approach might be slightly less performant when pulling random numbers +directly from the OS, but the overhead is probably insignificant relative to +the system call, and may often be zero. + +Three user-space RNGs are currently provided. Should this change? And should the +aim be to build a selection of high quality generators or keep the list short? +Are there any other RNGs worth adding now? + +* `IsaacRng` (32-bit) and `Isaac64Rng` +* `ChaChaRng` +* `XorShiftRng` + +The appropriate 32 or 64 variant of Isaac is exposed as `IsaacWordRng`. While +the concept is good, the name is weird. + +`StdRng` is currently a wrapper around `IsaacWordRng`, with a `new()` method +that seeds from `OsRng`. Possibly this should be replaced with two wrapper structs +or simply re-bound names: `FastRng` and `CryptoRng`. + +`thread_rng()` current constructs a reference-counted periodically-reseeding +`StdRng` per thread on first use. TODO: allow user override via dynamic dispatch? +Rename to `crypto_rng()`? + +`weak_rng()` currently constructs a new `XorShiftRng` seeded via `OsRng` each +time it is called. Rename to `fast_rng()` and make it use a `FastRng` type? +What about `random()`, should for example the documentation point out that +creating a `weak_rng()` may be useful for performance where crypto-strength +generation is not needed? + +### Generator adaptors + +`ReseedingRng` is a wrapper which periodically reseeds the enclosed RNG. + +Should a similar wrapper to periodically inject entropy from the OS be added? +Of course this shouldn't be necessary normally, but it might help when (a) the +initial OS-provided seed had little entropy and (b) cycles might otherwise occur. + +The `SeedableRng` trait is an optional extra allowing reseeding: + +``` +pub trait SeedableRng: Rng { + fn reseed(&mut self, _: Seed); + fn from_seed(seed: Seed) -> Self; +} +``` + +## Random values + +This section concerns creating random values of various types and with various +distributions given a generator (`Rng`). + +This part of the design already has a fairly good story in the strawman design, +namely the [`Rand` trait] and associated +[`Distribution` trait and impl of Rand](https://github.com/dhardy/rand/blob/master/src/distributions/mod.rs#L58), the available distributions, and the +[`random`](https://dhardy.github.io/rand/rand/fn.random.html) and +[`random_with`](https://dhardy.github.io/rand/rand/fn.random_with.html) +convenience functions. + +The `Rand::rand(rng, distribution)` function takes the `distribution` parameter +by value; this might cause extra copying in some cases. But most distributions +are small or zero-size; the `Gamma`-derived ones are the only ones larger than +three `f64`s, and the copy can likely be optimised out(?). + +### Distributions + +The [`Distribution`](https://dhardy.github.io/rand/rand/distributions/trait.Distribution.html) +trait replaces `rand`'s current +[`IndependentSample`](https://docs.rs/rand/0.3.15/rand/distributions/trait.IndependentSample.html) +trait. The `Sample` trait is removed; I believe it was originally intended for use +in random processes like random walks; these are discrete-time (stochastic) +models, thus `advance_state()` and `get_state()` functions are more applicable +than `sample()`; in any case this is beyond the scope of `rand`. + +Several zero-size structs implementing `Distribution` specify simple distributions: + +* `Uniform` specifies uniform distribution over the entire range available, and + is implemented for all integer types and `bool` +* `Uniform01` specifies uniform distribution over the half-open range `[0, 1)`, + and is implemented for `f32` and `f64` +* `Closed01` is like `Uniform01` but for `[0, 1]` (thus including 1.0) +* `Open01` is like `Uniform01` but for `(0, 1)` (thus excluding 0.0) +* `Default` chooses `Uniform` or `Uniform01` depending on type, and could + be extended to other types. + +`Rand` is roughly the same as the +[old `Rand`](https://docs.rs/rand/0.3.15/rand/trait.Rand.html); currently it doesn't +support arrays, tuples, `Option`, etc., but it could conceivably, and probably +also `derive(Rand)`. The others are new. + +There is one further uniform distribution: + +* `Range` specifies uniform distribution over a range `[a, b)` and supports + integer and floating-point types + +This `Range` is unchanged from the current `rand`, and cannot be extended to +user-defined types despite the presence of a backing trait, `SampleRange`. +Possibly this should be adapted, although it should be noted that the internal +details are designed to support a specific set of types, and in any case a +user may create a new `MyRange` type implementing `Distribution`. + +Finally, there are several non-uniform distributions, unchanged from the +current `rand`: + +* `Exp` +* `Normal`, `LogNormal` +* `Gamma`, `ChiSquared`, `FisherF`, `StudentT` + +Currently these are only implemented for `f64`. Probably they could be extended +to `f32` quite easily. + +Internally, `Exp(1)` and `N(0, 1)` (standard normal) fixed distributions are +used; these could be exposed via new zero-size distribution structs. +This might be slightly faster for some uses (avoid a multiplication and extra +data access). + +Most distributions are implemented in public sub-modules, then *also* imported +into `distributions` via `pub use`. Possibly the sub-modules should be hidden. + +### Convenience functions and more distributions + +At the top-level of the crate, two convenience functions are available; the +first is roughly equivalent to that in the current `rand` while the second is +new: + +* `random() -> T` using the thread-local `Rng` and generating any value + supporting `Rand` +* `random_with(distribution) -> T` instead generating values using the given + `distribution` (which can be of any type supporting `Distribution`) + +Additionally, within the `distributions` module, some more convenience functions +are available: + +* `uniform(rng) -> T`, equivalent to `Rand::rand(rng, Uniform)` +* `range(low, high, rng) -> T`, equivalent to `Rand::rand(rng, Range::new(low, high))` + +It is debatable whether these are worth keeping and possibly extending to include +`uniform01(rng) -> T` etc. They are convenient when used with iterators (see below). + +A couple more distributions are available using functions of the same form, +but (currently) without a `Distribution` implementor representing them: + +* `codepoint(rng) -> char` generating values uniformly distributed over all valid + Unicode codepoints, even though many are unassigned. (This may be useless?) +* `ascii_word_char(rng) -> char` uniformly selects from `A-Z`, `a-z` and `0-9` + +### Iteration + +Iterators are available as wrappers are an `Rng`. These don't support `next` +for compatibility with the borrow checker, but [do support `map` and `flat_map` +as well as `take`](https://dhardy.github.io/rand/rand/iter/struct.Iter.html). +The objects returned by `map` and `flat_map` are +[standard iterators](https://doc.rust-lang.org/std/iter/trait.Iterator.html). + +These can be used to generate collections and strings: + +``` +use rand::{thread_rng, Rng, iter}; +use rand::distributions::{uniform, ascii_word_char}; + +let mut rng = thread_rng(); +let x: Vec = iter(&mut rng).take(10).map(|rng| uniform(rng)).collect(); +println!("{:?}", x); + +let w: String = iter(&mut rng).take(6).map(|rng| ascii_word_char(rng)).collect(); +println!("{}", w); +``` + +This is considerably changed from the current `rand`, which instead has +[`Rng::gen_iter()`](https://docs.rs/rand/0.3.15/rand/trait.Rng.html#method.gen_iter) +using `Rng::gen()` and +[`Rng::gen_ascii_chars()`](https://docs.rs/rand/0.3.15/rand/trait.Rng.html#method.gen_ascii_chars) +for generating random letters (equivalent to `ascii_word_char()`). + +### Other stuff + +Function [`weighted_bool(n, rng) -> bool`](https://dhardy.github.io/rand/rand/distributions/fn.weighted_bool.html) +is a simple probability function. + +The [`sequences` module](https://dhardy.github.io/rand/rand/sequences/index.html) +supports sampling one element from a sequence (`Choose`), sampling several +(`sample`), and shuffling (`Shuffle`). + +[`WeightedChoice`](https://dhardy.github.io/rand/rand/sequences/struct.WeightedChoice.html) +is a support type allowing weighted sampling from a sequence. + +# Drawbacks +[drawbacks]: #drawbacks + +This attempt to redesign `rand` assumes there are no major issues with API +breakage. Several parts of the old API should uremain compatible, but little +attention has been paid to this. + +If the crate is split into two crates (crypto and non-crypto), some uses (randomised +algorithms) will not clearly fit into either one, likely many distributions and +support functions will not be available from the crypto API, but possibly a +decent compromise could be reached. + +# Rationale and Alternatives +[alternatives]: #alternatives + +We *could* leave `rand` as is for now. We could even stabilise the current design. +But I haven't seen anyone advocate this option. + +# Unresolved questions +[unresolved]: #unresolved-questions + +Lots and lots; see above. + +The `derive_rand` sub-crate of the current `rand` provides another method to +generate random values of the current type. This could probably be adjusted to +derive `Rand` or maybe even support custom distributions. In the +strawman design I simply deleted this sub-crate since I have no interest in +creating random values this way. + +[Crate evaluation thread]: https://internals.rust-lang.org/t/crate-evaluation-for-2017-07-25-rand/5505 +[Strawman design PR]: https://github.com/rust-lang-nursery/rand/pull/161 +[Strawman design doc]: https://dhardy.github.io/rand/rand/index.html +[`Rand` trait]: https://dhardy.github.io/rand/rand/distributions/trait.Rand.html From 78bb681ea285750819cd9ad5cc043992ff40fc6f Mon Sep 17 00:00:00 2001 From: Diggory Hardy Date: Sun, 6 Aug 2017 16:07:21 +0100 Subject: [PATCH 02/18] Add Rand vs Distribution section on redundancy --- text/0000-rand-crate-redesign.md | 62 ++++++++++++++++++++++++++++++-- 1 file changed, 60 insertions(+), 2 deletions(-) diff --git a/text/0000-rand-crate-redesign.md b/text/0000-rand-crate-redesign.md index b19246da443..22e88fa6150 100644 --- a/text/0000-rand-crate-redesign.md +++ b/text/0000-rand-crate-redesign.md @@ -224,8 +224,8 @@ Several zero-size structs implementing `Distribution` specify simple distributio and is implemented for `f32` and `f64` * `Closed01` is like `Uniform01` but for `[0, 1]` (thus including 1.0) * `Open01` is like `Uniform01` but for `(0, 1)` (thus excluding 0.0) -* `Default` chooses `Uniform` or `Uniform01` depending on type, and could - be extended to other types. +* `Default` uses `Uniform` or `Uniform01` depending on type (and can be + extended for other types) `Rand` is roughly the same as the [old `Rand`](https://docs.rs/rand/0.3.15/rand/trait.Rand.html); currently it doesn't @@ -261,6 +261,64 @@ data access). Most distributions are implemented in public sub-modules, then *also* imported into `distributions` via `pub use`. Possibly the sub-modules should be hidden. +### `Rand` vs `Distribution` + +You may have noticed that there is some redundancy between `Rand` and +`Distribution`, namely that one implements the other automatically: + +``` +/// Types (distributions) that can be used to create a random instance of `T`. +pub trait Distribution { + /// Generate a random value of `T`, using `rng` as the + /// source of randomness. + fn sample(&self, rng: &mut R) -> T; +} + +/// Generic trait for sampling random values from some distribution +pub trait Rand { + fn rand(rng: &mut R, dist: Dist) -> Self; +} + +impl> Rand for T { + fn rand(rng: &mut R, dist: D) -> Self { + dist.sample(rng) + } +} +``` + +This means that for any implementation of a distribution, there are two +equivalent ways of using it: + +``` +impl Distribution for Uniform { ... } + +// via Rand: +let x = i64::rand(rng, Uniform); +// via Distribution: +let x: i64 = Uniform.sample(rng); +``` + +We could remove this redundancy in one of two ways: + +1. Remove `Rand`. `fn random()` would become `Default.sample(&mut thread_rng())`. +2. Remove the `Distribution` trait and implement `Rand` directly. In this + case `Default`, `Uniform` etc. stay, but no longer implement a common trait. + +Should we keep both? As a third option, we could replace `Rand` with a +simpler `Rand`: + +``` +pub trait Rand { + fn rand(rng: &mut R) -> Self; +} + +impl Rand for T where Default: Distribution { + fn rand(rng: &mut R) -> Self { + Default.sample(rng) + } +} +``` + ### Convenience functions and more distributions At the top-level of the crate, two convenience functions are available; the From eccb4af79d7d21043f174b4fe22537aef917bbd2 Mon Sep 17 00:00:00 2001 From: Diggory Hardy Date: Mon, 7 Aug 2017 12:28:05 +0100 Subject: [PATCH 03/18] Rand RFC: update section on random value creation --- text/0000-rand-crate-redesign.md | 270 ++++++++++++++++++++++--------- 1 file changed, 194 insertions(+), 76 deletions(-) diff --git a/text/0000-rand-crate-redesign.md b/text/0000-rand-crate-redesign.md index 22e88fa6150..3bf4703caae 100644 --- a/text/0000-rand-crate-redesign.md +++ b/text/0000-rand-crate-redesign.md @@ -194,43 +194,126 @@ pub trait SeedableRng: Rng { This section concerns creating random values of various types and with various distributions given a generator (`Rng`). -This part of the design already has a fairly good story in the strawman design, -namely the [`Rand` trait] and associated -[`Distribution` trait and impl of Rand](https://github.com/dhardy/rand/blob/master/src/distributions/mod.rs#L58), the available distributions, and the -[`random`](https://dhardy.github.io/rand/rand/fn.random.html) and -[`random_with`](https://dhardy.github.io/rand/rand/fn.random_with.html) -convenience functions. - -The `Rand::rand(rng, distribution)` function takes the `distribution` parameter -by value; this might cause extra copying in some cases. But most distributions -are small or zero-size; the `Gamma`-derived ones are the only ones larger than -three `f64`s, and the copy can likely be optimised out(?). +The strawman design showcases two traits for generating random values of the +current type, the [`Rand`] trait and [`SimpleRand`]. It is the intention to only +keep one of these, and name whichever remains `Rand`. The first, (currently +named) [`Rand`], supports parameterisation by a distribution, thus giving +explicit control over how values are generated. The second, [`SimpleRand`] lacks this +parameterisation, making simple usage simpler but requiring usage of +distributions directly for other cases. + +Both "Rand" traits work in concert with the [`Distribution`] trait; more on that +below. For these examples we'll use two implementations: the "best-for-the-type" +[`Default`] distribution and the [`Range`] distribution. + +Now to some [`Rand`] examples: + +```rust +use rand::distributions::{Rand, Default, Range}; +let mut rng = rand::thread_rng(); + +// Type annotation needed; two options: +let byte: u8 = Rand::rand(&mut rng, Default); +let byte = u8::rand(&mut rng, Default); + +// For ranges, the generated type is the same as the parameter type: +let ranged = Rand::rand(&mut rng, Range::new(-99, 100)); +``` + +And some [`SimpleRand`] examples: + +```rust +use rand::distributions::{SimpleRand, Distribution, Range}; +let mut rng = rand::thread_rng(); + +// Again, type annotation is needed; two options: +let byte: u8 = SimpleRand::simple_rand(&mut rng); +let byte = u8::simple_rand(&mut rng); + +// SimpleRand does not support other distributions, so we have to use the +// distribution directly: +let ranged = Range::new(-99, 100).sample(&mut rng); +``` + +Note that the `Default` distribution also supports direct sampling, so we don't +need *either* version of `Rand`: + +``` +use rand::distributions::{Distribution, Default}; +let mut rng = rand::thread_rng(); + +let byte: u8 = Default.sample(&mut rng); +``` + +#### Pass by copy? + +Currently [`Rand::rand`] takes the distribution parameter by value. This is the +best option for zero-size distribution types like [`Default`] and [`Open01`], since +it allows call syntax like `Rand::rand(&mut rng, Default)` (second parameter +does not need to be referenced). + +Most distribution types are fairly small, e.g. `Range` is two or three values +of the parameterised type, so for the most part pass-by-value is reasonable, +although for example `Gamma` is 40 bytes. Can the compiler optimise this? + +On the other hand, `Distribution::sample` takes itself by reference. This is +required for the special `Weighted` distribution, which does not support `Copy`. +Does this add overhead? Note that currently `rand` is implemented using +`sample`, which in some ways is the worst of both worlds. Should distributions +be required to support `Copy` or, at least, should `sample` take `self` by +value? ### Distributions -The [`Distribution`](https://dhardy.github.io/rand/rand/distributions/trait.Distribution.html) -trait replaces `rand`'s current -[`IndependentSample`](https://docs.rs/rand/0.3.15/rand/distributions/trait.IndependentSample.html) +The [`Distribution`] trait replaces `rand`'s current [`IndependentSample`] trait. The `Sample` trait is removed; I believe it was originally intended for use in random processes like random walks; these are discrete-time (stochastic) models, thus `advance_state()` and `get_state()` functions are more applicable than `sample()`; in any case this is beyond the scope of `rand`. -Several zero-size structs implementing `Distribution` specify simple distributions: +The surviving trait is quite simple: -* `Uniform` specifies uniform distribution over the entire range available, and +```rust +/// Types (distributions) that can be used to create a random instance of `T`. +pub trait Distribution { + /// Generate a random value of `T`, using `rng` as the + /// source of randomness. + fn sample(&self, rng: &mut R) -> T; +} +``` + +This could be extended with other functions such as +`fn map T>(&self, f: F) -> MappedDistribution`, but I do not +see a good rationale. + +Any implementation, such as [`Default`], supports usage via `sample`: +`Default.sample(&mut rng)`. (Note that `struct Default;` is valueless; Rust +allows objects to be created without any extra syntax: `let x = Default;`.) + +Several zero-size structs implementing [`Distribution`] specify simple distributions: + +* [`Uniform`] specifies uniform distribution over the entire range available, and is implemented for all integer types and `bool` -* `Uniform01` specifies uniform distribution over the half-open range `[0, 1)`, +* [`Uniform01`] specifies uniform distribution over the half-open range `[0, 1)`, and is implemented for `f32` and `f64` -* `Closed01` is like `Uniform01` but for `[0, 1]` (thus including 1.0) -* `Open01` is like `Uniform01` but for `(0, 1)` (thus excluding 0.0) -* `Default` uses `Uniform` or `Uniform01` depending on type (and can be +* [`Closed01`] is like [`Uniform01`] but for `[0, 1]` (thus including 1.0) +* [`Open01`] is like [`Uniform01`] but for `(0, 1)` (thus excluding 0.0) +* [`Default`] uses [`Uniform`] or [`Uniform01`] depending on type (and can be extended for other types) -`Rand` is roughly the same as the +[`Default`] has roughly the same capabilities as the [old `Rand`](https://docs.rs/rand/0.3.15/rand/trait.Rand.html); currently it doesn't support arrays, tuples, `Option`, etc., but it could conceivably, and probably -also `derive(Rand)`. The others are new. +also `derive(Rand)` or something similar. + +It should be noted that there is no agreement on using the name `Default`. In +particular, there is a naming conflict with `std::default::Default`, which can +lead to surprising compiler messages if the user forgets to +`use rand::Default;`. Similarly, `Uniform` and `Uniform01` are open to +adjustment. All three could be replaced with a single `Uniform`; this just +leaves two semantic issues: the range differs by type, and some possible +type-dependent implementations (such as for `Option`) cannot practically have +uniform distribution. There is one further uniform distribution: @@ -250,8 +333,8 @@ current `rand`: * `Normal`, `LogNormal` * `Gamma`, `ChiSquared`, `FisherF`, `StudentT` -Currently these are only implemented for `f64`. Probably they could be extended -to `f32` quite easily. +Currently these are only implemented for `f64`. They could be extended to `f32` +but this might require adding some more lookup tables to the crate. Internally, `Exp(1)` and `N(0, 1)` (standard normal) fixed distributions are used; these could be exposed via new zero-size distribution structs. @@ -263,72 +346,66 @@ into `distributions` via `pub use`. Possibly the sub-modules should be hidden. ### `Rand` vs `Distribution` -You may have noticed that there is some redundancy between `Rand` and -`Distribution`, namely that one implements the other automatically: - -``` -/// Types (distributions) that can be used to create a random instance of `T`. -pub trait Distribution { - /// Generate a random value of `T`, using `rng` as the - /// source of randomness. - fn sample(&self, rng: &mut R) -> T; -} +As suggested above, both `Rand` traits are basically wrappers around +`Distribution`: -/// Generic trait for sampling random values from some distribution -pub trait Rand { - fn rand(rng: &mut R, dist: Dist) -> Self; +```rust +impl> Rand for T { + fn rand(rng: &mut R, distr: D) -> Self { + distr.sample(rng) + } } -impl> Rand for T { - fn rand(rng: &mut R, dist: D) -> Self { - dist.sample(rng) +impl SimpleRand for T where Default: Distribution { + fn simple_rand(rng: &mut R) -> Self { + Default.sample(rng) } } ``` -This means that for any implementation of a distribution, there are two -equivalent ways of using it: - -``` -impl Distribution for Uniform { ... } +This implies that the `Rand` traits could be removed altogether without any +loss of functionality. Alternatively, we could remove the `Distribution` trait, +keep the distributions (`Default`, `Range`, etc.), and implement `Rand` +directly: -// via Rand: -let x = i64::rand(rng, Uniform); -// via Distribution: -let x: i64 = Uniform.sample(rng); +```rust +impl Rand for u32 { + fn rand(rng: &mut R, _distr: Uniform) -> Self { + rng.next_u32() + } +} ``` -We could remove this redundancy in one of two ways: +For the user, this leaves a choice between: -1. Remove `Rand`. `fn random()` would become `Default.sample(&mut thread_rng())`. -2. Remove the `Distribution` trait and implement `Rand` directly. In this - case `Default`, `Uniform` etc. stay, but no longer implement a common trait. +```rust +// simple Rand (SimpleRand in this document): +use rand::Rand; +let x = i64::rand(rng); -Should we keep both? As a third option, we could replace `Rand` with a -simpler `Rand`: +// complex Rand: +use rand::Rand; +let y = i64::rand(rng, Default); -``` -pub trait Rand { - fn rand(rng: &mut R) -> Self; -} +// no Rand: +use rand::Distribution; +let z: i64 = Default.sample(rng); -impl Rand for T where Default: Distribution { - fn rand(rng: &mut R) -> Self { - Default.sample(rng) - } -} +// in all cases, we can still have: +let a: i64 = rand::random(); ``` ### Convenience functions and more distributions -At the top-level of the crate, two convenience functions are available; the -first is roughly equivalent to that in the current `rand` while the second is -new: +The above examples all get randomness from `thread_rng()`. For this case, two +convenience functions are available: -* `random() -> T` using the thread-local `Rng` and generating any value - supporting `Rand` -* `random_with(distribution) -> T` instead generating values using the given - `distribution` (which can be of any type supporting `Distribution`) +* [`random`], essentially `fn random() { Default.sample(&mut thread_rng()) }` +* [`random_with`], essentially + `fn random_with(distr: D) { distr.sample(&mut thread_rng()) }` + +These do not require a `Rand` trait. Since calling `thread_rng()` has a little +overhead, these functions are slightly inefficient when called multiple times. Additionally, within the `distributions` module, some more convenience functions are available: @@ -340,11 +417,15 @@ It is debatable whether these are worth keeping and possibly extending to includ `uniform01(rng) -> T` etc. They are convenient when used with iterators (see below). A couple more distributions are available using functions of the same form, -but (currently) without a `Distribution` implementor representing them: +but (currently) without a `Distribution` implemention representing them: -* `codepoint(rng) -> char` generating values uniformly distributed over all valid - Unicode codepoints, even though many are unassigned. (This may be useless?) -* `ascii_word_char(rng) -> char` uniformly selects from `A-Z`, `a-z` and `0-9` +* [`codepoint`] `(rng) -> char` generating values uniformly distributed over all valid + Unicode codepoints, even though many are unassigned. This may be useless + but is the most obvious implementation of `Distribution` for + [`Uniform`] and [`Default`]. +* [`ascii_word_char`] `(rng) -> char` uniformly selects from `A-Z`, `a-z` and + `0-9`, thus is convenient for producing basic random "words" (see usage + with iterators below). ### Iteration @@ -404,6 +485,23 @@ decent compromise could be reached. We *could* leave `rand` as is for now. We could even stabilise the current design. But I haven't seen anyone advocate this option. +@Lokathor proposed an alternative design for distributions: make each a trait: + +```rust +pub trait Rand: Sized { + fn rand(rng: &mut R) -> Self; +} + +pub trait RangedRand: Rand + PartialOrd { + fn rand_range(rng: &mut R, low: Self, high: Self) -> Self; +} +``` + +This however lacks a way of separating parameterisation of the distribution +from sampling; for example `Range` does some calculations in `Range::new`, +creates a simple object with two or three values, then does minimal work during +sampling (`my_range.sample(&mut rng)`). + # Unresolved questions [unresolved]: #unresolved-questions @@ -415,7 +513,27 @@ derive `Rand` or maybe even support custom distributions. In the strawman design I simply deleted this sub-crate since I have no interest in creating random values this way. +Should we introduce `RangeTo::new(high)` (same as `Range::new(0, high)`) to save +extra add/subtract? No...? + +Maybe we should replace `Range` with `RangeInt`, `RangeFloat` etc.? + [Crate evaluation thread]: https://internals.rust-lang.org/t/crate-evaluation-for-2017-07-25-rand/5505 [Strawman design PR]: https://github.com/rust-lang-nursery/rand/pull/161 [Strawman design doc]: https://dhardy.github.io/rand/rand/index.html -[`Rand` trait]: https://dhardy.github.io/rand/rand/distributions/trait.Rand.html +[`Rand`]: https://dhardy.github.io/rand/rand/distributions/trait.Rand.html +[`SimpleRand`]: https://dhardy.github.io/rand/rand/distributions/trait.SimpleRand.html +[`Distribution`]: https://dhardy.github.io/rand/rand/distributions/trait.Distribution.html +[`Default`]: https://dhardy.github.io/rand/rand/struct.Default.html +[`Uniform`]: https://dhardy.github.io/rand/rand/distributions/struct.Uniform.html +[`Uniform01`]: https://dhardy.github.io/rand/rand/distributions/struct.Uniform01.html +[`Open01`]: https://dhardy.github.io/rand/rand/distributions/struct.Open01.html +[`Closed01`]: https://dhardy.github.io/rand/rand/distributions/struct.Closed01.html +[`Range`]: https://dhardy.github.io/rand/rand/distributions/range/struct.Range.html +[`Rand::rand`]: https://dhardy.github.io/rand/rand/distributions/trait.Rand.html#tymethod.rand +[`random`]: https://dhardy.github.io/rand/rand/fn.random.html +[`random_with`]: https://dhardy.github.io/rand/rand/fn.random_with.html +[`IndependentSample`]: https://docs.rs/rand/0.3.16/rand/distributions/trait.IndependentSample.html +[`rand_derive`]: https://docs.rs/rand_derive/0.3.0/rand_derive/ +[`codepoint`]: https://dhardy.github.io/rand/rand/distributions/fn.codepoint.html +[`ascii_word_char`]: https://dhardy.github.io/rand/rand/distributions/fn.ascii_word_char.html From 183022a7af5fce2c0db20ce1ac40bae56f81d43d Mon Sep 17 00:00:00 2001 From: Diggory Hardy Date: Mon, 7 Aug 2017 12:28:38 +0100 Subject: [PATCH 04/18] Rand RFC: add mini sections on rand_derive and no_std --- text/0000-rand-crate-redesign.md | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/text/0000-rand-crate-redesign.md b/text/0000-rand-crate-redesign.md index 3bf4703caae..d4b048132db 100644 --- a/text/0000-rand-crate-redesign.md +++ b/text/0000-rand-crate-redesign.md @@ -467,6 +467,24 @@ supports sampling one element from a sequence (`Choose`), sampling several [`WeightedChoice`](https://dhardy.github.io/rand/rand/sequences/struct.WeightedChoice.html) is a support type allowing weighted sampling from a sequence. +### `rand_derive` + +The current `rand` has a sub-crate, [`rand_derive`] +([source code](https://github.com/rust-lang-nursery/rand/tree/master/rand-derive)). +Probably something similar could be designed for [`Default`] or `Rand`, +but due to the current author's lack of interest this has not been investigated. + +## `no_std` support + +This was requested, and has been implemented in the refactor by hiding many +parts of rand behind `#[cfg(feature="std")]`. Without `std` quite a few features +are removed, including `thread_rng()`, `random()`, and the entire `os` module +which normally provides entropy for initialisation. + +API doc can be generated via `cargo doc --no-default-features`, but is not +currently hosted (TODO), and `no_std` support is not automatically tested +(TODO; use `cargo build --no-default-features` for now). + # Drawbacks [drawbacks]: #drawbacks From 3477acd41e2ed2012be3931eaf49335c9dd8f21d Mon Sep 17 00:00:00 2001 From: Diggory Hardy Date: Mon, 7 Aug 2017 12:45:17 +0100 Subject: [PATCH 05/18] Rand RFC: update text on Range and questions --- text/0000-rand-crate-redesign.md | 34 +++++++++++++++++++++++--------- 1 file changed, 25 insertions(+), 9 deletions(-) diff --git a/text/0000-rand-crate-redesign.md b/text/0000-rand-crate-redesign.md index d4b048132db..c58f82a34d8 100644 --- a/text/0000-rand-crate-redesign.md +++ b/text/0000-rand-crate-redesign.md @@ -194,6 +194,10 @@ pub trait SeedableRng: Rng { This section concerns creating random values of various types and with various distributions given a generator (`Rng`). +Most of this functionality is contained in the [`distributions`] module. +(For a time this module was renamed `dist` for brevity, but renamed back to +avoid confusion. `distr` might be another possibility.) + The strawman design showcases two traits for generating random values of the current type, the [`Rand`] trait and [`SimpleRand`]. It is the intention to only keep one of these, and name whichever remains `Rand`. The first, (currently @@ -317,17 +321,19 @@ uniform distribution. There is one further uniform distribution: -* `Range` specifies uniform distribution over a range `[a, b)` and supports +* [`Range`] specifies uniform distribution over a range `[a, b)` and supports integer and floating-point types -This `Range` is unchanged from the current `rand`, and cannot be extended to -user-defined types despite the presence of a backing trait, `SampleRange`. -Possibly this should be adapted, although it should be noted that the internal -details are designed to support a specific set of types, and in any case a -user may create a new `MyRange` type implementing `Distribution`. +This [`Range`] is minimally changed from the current `rand`, and supports +extension to user-defined types by exposing its internal fields. An alternative +implementation, [`range2`], has been written in an attempt to improve extension +to other types and avoid the need for an unused `zone` field with float types, +but has some drawbacks, perhaps most notably that `Range` is parameterised so +that `Range::new(low, high)` must be replaced with `new_range(low, high)` or +`Range::::new(low, high)`. -Finally, there are several non-uniform distributions, unchanged from the -current `rand`: +Finally, there are several more [`distributions`], +unchanged from the current `rand`: * `Exp` * `Normal`, `LogNormal` @@ -523,7 +529,14 @@ sampling (`my_range.sample(&mut rng)`). # Unresolved questions [unresolved]: #unresolved-questions -Lots and lots; see above. +Lots and lots; perhaps the biggest ones: + +* TODO: generators +* Rename the `distributions` module? +* Rename `Uniform`, `Default`, etc.? +* Keep the parameterised `Rand`, replace with the simple `Rand` or + remove both? +* Replace [`range`] with [`range2`] ? The `derive_rand` sub-crate of the current `rand` provides another method to generate random values of the current type. This could probably be adjusted to @@ -555,3 +568,6 @@ Maybe we should replace `Range` with `RangeInt`, `RangeFloat` etc.? [`rand_derive`]: https://docs.rs/rand_derive/0.3.0/rand_derive/ [`codepoint`]: https://dhardy.github.io/rand/rand/distributions/fn.codepoint.html [`ascii_word_char`]: https://dhardy.github.io/rand/rand/distributions/fn.ascii_word_char.html +[`range`]: https://dhardy.github.io/rand/rand/distributions/range/index.html +[`range2`]: https://dhardy.github.io/rand/rand/distributions/range2/index.html +[`distributions`]: (https://dhardy.github.io/rand/rand/distributions/index.html) From 49e866cce28bf927608b51a90d089fea31ef7833 Mon Sep 17 00:00:00 2001 From: Diggory Hardy Date: Thu, 10 Aug 2017 09:28:23 +0100 Subject: [PATCH 06/18] Update rand crate RFC: API and generators, introduction --- text/0000-rand-crate-redesign.md | 268 +++++++++++++++++++++++++------ 1 file changed, 223 insertions(+), 45 deletions(-) diff --git a/text/0000-rand-crate-redesign.md b/text/0000-rand-crate-redesign.md index c58f82a34d8..0adae59a075 100644 --- a/text/0000-rand-crate-redesign.md +++ b/text/0000-rand-crate-redesign.md @@ -9,13 +9,20 @@ Evaluate options for the future of `rand` regarding both cryptographic and non-cryptographic uses. +There is a strawman design which implements or demonstrates many of the changes +suggested in this document, but is not entirely in-sync with the suggestions +here. + See also: * [Crate evaluation thread] * [Strawman design PR] * [Strawman design doc] -# Motivation +# Introduction +[introduction]: #introduction + +## Motivation [motivation]: #motivation The [crate evaluation thread] brought up the issue of stabilisation of the `rand` @@ -27,7 +34,48 @@ approach to random number generation is sufficient (*good enough*) or whether splitting the crate into two is the better option: one focussed on cryptography, the other on numerical applications. -# Guide-level explanation +## Background +[background]: #background + +A *Pseudo-Random Number Generator*, abbreviated PRNG or simply RNG, is a +deterministic algorithm for generating *random-like* numbers. Such algorithms +typically have a fixed-size state space, a *seed* function to produce an initial +state from some value, an *advance* function to step from one state to the next, +and a *generate* function to yield a value in the output domain (type). This +implies the following properties: + +* Generated numbers are reproducible, when the algorithm and seed (or initial + state) is known +* All PRNGs have a finite period: eventually the *advance* function will + reproduce a prior state (not necessarily the initial one), and the sequence + of numbers will repeat + +Given a fixed state-space of `k` bits, at most `2^k` states are possible. A +good PRNG should have a large period, possibly close to `2^k`; this period +may depend on the seed value, but good PRNGs should ensure a large period for +all seeds. + +To be useful, a PRNGs should usually also have the following properties: + +* Require little memory and have fast computation +* Generated values are uniformly distributed across the output domain +* The distribution of each value is indepedent of previously generated + values(1) +* For cryptographic applications, it is not possible to predict knowledge of + prior values does not aid in predicting the next value(1) + +Note (1): obviously once a PRNG has completed its initial period and started +repeating itself, the above properties are no longer possible. It is therefore +required that the period is very long. + +Further, a PRNG may offer a *jump-ahead* function to quickly calculate the +state after a large arbitrary number of steps `v`. This allows a random number +stream to be deterministically partitioned into a number of sub-streams. + +L'Ecuyer provides some more background on PRNGs in +[Random Number Generation](https://scholar.google.com/citations?view_op=view_citation&hl=en&user=gDxovowAAAAJ&citation_for_view=gDxovowAAAAJ:d1gkVwhDpl0C) (Springer, 2012). + +## Guide-level explanation [guide-level-explanation]: #guide-level-explanation Since this concerns one (or more) crates outside the standard library, it is @@ -38,11 +86,19 @@ still needs writing, but must wait on design decisions. [reference-level-explanation]: #reference-level-explanation ## Generation API +[generation-API]: #generation-API -This section concerns the `Rng` trait, but not specifically implementations or -generation of values of other types. +This section concerns the `Rng` trait and extension traits, but not +specifically implementations or generation of values of other types. -Aside: one proposal is to rename `Rng` to `Generator`. +The `Rng` trait covers *generation* of values. It is not specific to +[deterministic] PRNGs; instead there is an extension trait `SeedableRng: Rng` +covering deterministic seeding. + +It has been proposed to rename `Rng` to `Generator`; this proposal has not +seen a lot of support. + +### `Rng` The `Rng` trait governs what types can be generated directly, and there appears to be consensus on removing all convenience functions not concerned with @@ -66,15 +122,23 @@ proposed to remove `next_f32` and `next_f64`. This simplifies the API and remove non-trivial integer to float conversions from the trait, but is not truely necessary. -Next, it has been suggested that `next_u8`, `next_u16` and (where supported) -`next_u128` should be added. That gives: +It has been suggested that `next_u8`, `next_u16` and (where supported) +`next_u128` functions should be added. However, provided that default +implementations are included, these can be added to the trait in the future if +desired. It seems unlikely that PRNGs would offer faster generation of `u8` and +`u16` values than simply casting (`next_u32() as u8`), and the only curent +alternative, OS-based-generation, has high overhead; therefore there appears +little use for `next_u8` and `next_u16` functions. `next_u128` on the other +hand may have some utility: [native 128-bit generators already +exist](http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/). This may provide +some performance benefit to applications requiring many pairs of 64-bit values, +but I suggest not adding `next_u128` for now unless specifically requested. ``` trait Rng { - fn next_u8(&mut self) -> u8 - fn next_u16(&mut self) -> u16 fn next_u32(&mut self) -> u32 fn next_u64(&mut self) -> u64 + // and possibly: fn next_u128(&mut self) -> u128 fn fill_bytes(&mut self, dest: &mut [u8]) @@ -82,9 +146,9 @@ trait Rng { ``` For crypto purposes, it has been suggested that `fill_bytes` alone would be -sufficient. For non-crypto purposes, the other methods (at least 32-bit and -64-bit variants) are desirable for performance, since many RNGs natively -produce `u32` or `u64` values. +sufficient. For non-crypto purposes, at least `next_u32` and `next_u64` +are desirable for performance, since many RNGs natively +produce these values. Further to this, there is discussion on whether these methods should all return a `Result`. Apparently, some crypto RNGs can estimate available entropy and @@ -99,53 +163,115 @@ functions to each return a `Result`. From the point of view of non-cryptographic numeric random number generation, RNGs are usually fast, deterministic functions which have no means to detect errors. Some may be able to detect lack of initialisation, but some implementations -always initialise with a fixed seed if no custom seed is provided. These may cycle, -likely will be unable to detect cycles (note that returning the same value twice -does not imply a cycle). Thus for non-crypto usage, returning a `Result` is -unnecessary, would require extra error handling code or `unwrap`s, and ahs some -performance penalty. (Note that all distributions would probably need to return -a `Result` too.) +always initialise with a fixed seed if no custom seed is provided. PRNGs could +cycle, but usually have a very long period and no practical way of detecting +cycles (note that returning the same value twice does not imply a cycle, and +that a cycle may return to a state later than the initial state). There is +therefore little use in non-crypto PRNGs returning a `Result`; doing so +would also require extra error handling in user code or `unwrap` within the +library, as well as some performance overhead. All distributions would need to +be adapted. There is therefore a conflict of design here; [Brian Smith suggests separate crypto and non-crypto APIs](https://internals.rust-lang.org/t/crate-evaluation-for-2017-07-25-rand/5505/59?u=dhardy) (and presumably crates). This would allow a minimal crypto trait with a single `fill_bytes(..) -> Result<..>` method, without impacting performance or -correctness (`unwrap`) of non-crypto code. +correctness (`unwrap`) of non-crypto code, while PRNGs have simple methods like +`next_u32(&mut self) -> u32`. + +In support of this split, there is a strong argument that cryptographic +applications should [relying on OS +generators](https://internals.rust-lang.org/t/crate-evaluation-for-2017-07-25-rand/5505/37) +where possible. Further, keeping cryptography-related crates small may be +useful for security reviews. + +If `rand`/`Rng` is split into multiple parts, several approaches are possible; +what to do here is very much undecided. + +First, "strong RNGs" designed for cryptographic usage could: + +1. Use a dedicated `CryptoRng` trait, indepedant of `Rng`, with methods + returning a `Result` on failure +2. As (1), but add a wrapper struct implementing `Rng` for a `CryptoRng` which + panics on errors +3. Use the standard `Rng` trait and be designed such that the only time + failure is possible is during creation (`fn new() -> Result`) +4. Do not include any generators which may fail in `rand` (leave to dedicated + crypto libraries, if at all) [practically, this is equivalent to 3 aside + from choice of generators to include] + +**Current preference:** undecided, possibly 3 + +Second, the `rand` crate could: + +1. Remain a single crate +2. Have a sub-crate `os_rand` covering only the `OsRng` functionality (via + simple functions), and have `rand` depend on `os_rand` for initialisation +3. Have a `rand_core` sub-crate including `OsRng` functionality, the `Rng` + and/or `CryptoRng` trait(s), and a crypto-approved PRNG; `rand` would + depend on `rand_core` +4. Keep all generators in `rand` and move all distributions (including + `random()`) to a `distributions` crate (or same split but with `rng` and + `rand` crate names) +5. Split into even more crates... + +**Current preference:** undecided + -My personal feeling is that -[relying on the OS](https://internals.rust-lang.org/t/crate-evaluation-for-2017-07-25-rand/5505/37) -where there are strong crypto requirements is the best choice, and that where a -user-space crypto RNG is required, the best design would be something like as -follows (but I have little experience with cryptography, and this does not allow -error on cycle detection): +### Extension traits +The `SeedableRng` trait should remain as is: + +```rust +pub trait SeedableRng: Rng { + fn reseed(&mut self, Seed); + fn from_seed(seed: Seed) -> Self; +} ``` -let mut seed_buf = [u8; LEN]; -rand::try_fill_from_os(&mut buf)?; // this may fail -let mut rng = SomeRng::from_seed(seed_buf); -// after creation, rng can be assumed not to fail + +Another trait could be added to allow jump-ahead: + +```rust +pub trait JumpableRng: Rng { + /// Return a copy of self, mutated as if `next_u32` had been called `steps` + /// times + fn jump_u32(&self, steps: usize) -> Self; + + // Also jump_u64() ? For most 64-bit generators they should be the same; + // for 32-bit generators, jump_u64 should jump twice as fast. +} ``` -Besides the issue of `next_u32` etc. and `fill_bytes` potentially failing, -another advantage of separate crypto and numeric `rand` crates would be absolute -simplicity of the crypto API and crate. Presumably in this case the numeric -crate would still depend on the crypto crate for correct initialisation. +And a trait could allow entropy injection. (Does the type `T` need to be a +template parameter?) + +```rust +pub trait InjectableRng: Rng { + /// Add entropy to the RNG, adjusting the sequence of numbers produced. + fn inject(&mut self, entropy: T); +} +``` -Further, should the `Rng` trait allow entropy injection and estimation of -available entropy? Obviously many RNGs won't be able to do the latter. -Entropy injection might be a viable alternative to periodic reseeding. +These traits can be added in the future without breaking compatibility, however +they may be worth discussing now. ## Generators -This section concerns implementations of `Rng`. +This section concerns implementations of `Rng`; +[the API is discussed above](generation-API). -`OsRng` currently implements `Rng` by directly sampling from whatever OS -functionality is available. It might be preferable to implement a -platform-specific `try_fill_from_os(buf: &mut [u8]) -> Result<()>` function, -and (possibly) implement `OsRng` as a wrapper around this. -This approach might be slightly less performant when pulling random numbers -directly from the OS, but the overhead is probably insignificant relative to -the system call, and may often be zero. +In no particular order, this section attempts to cover: + +* which generators should be provided by this library +* the story for generators in other crates +* requirements of generators included in this library +* seeding PRNGs + +### Current state of the `rand` crate + +Rand currently provides three PRNG algorithms, and a wrapper (`OsRng`) to +OS-provided numbers. It also provides a couple of simple wrappers, and two +traits, [`Rng` and `SeedableRng`](generation-API). Three user-space RNGs are currently provided. Should this change? And should the aim be to build a selection of high quality generators or keep the list short? @@ -162,6 +288,8 @@ the concept is good, the name is weird. that seeds from `OsRng`. Possibly this should be replaced with two wrapper structs or simply re-bound names: `FastRng` and `CryptoRng`. +TODO: `ReadRng` + `thread_rng()` current constructs a reference-counted periodically-reseeding `StdRng` per thread on first use. TODO: allow user override via dynamic dispatch? Rename to `crypto_rng()`? @@ -172,6 +300,49 @@ What about `random()`, should for example the documentation point out that creating a `weak_rng()` may be useful for performance where crypto-strength generation is not needed? +#### OS provision + +`OsRng` currently implements `Rng` by directly sampling from whatever OS +functionality is available. +The `OsRng` struct is useful in that it separates initialisation from generation +and that it stores any state required; this is not however as important as it +might appear. + +Initialisation is trivial on most platforms; the exceptions are: + +* Linux: implementation tests whether the `getrandom()` system call is + available by calling it, once, using synchronisation primitives; + on failure the implementation tries to construct a reader on `/dev/urandom`; + the latter can in theory fail but is present since Linux 1.3.30. +* Redox constructs a reader on `rand:`; in theory this can fail +* NaCl queries an interface, then either explicitly returns an `Err` or, on + success, asserts that it has a function pointer then succeeds + +After initialisation, panics are possible if: + +* (Linux) getrandom returns an unexpected error +* (Linux via urandom, Redox): file read fails +* (IOS, FreeBSD, OpenBSD, Fuchsia, Windows, NaCl): system call returns an + error + +It may be worth investigating which of these panics could conceivably happen, +and add appropriate testing during initialisation. + +On the other hand, since the primary use of `OsRng` is to seed another RNG +(single use), and since all possible platforms can in theory cause an error +after initialisation, it might be preferable to replace `OsRng` with a simple +`try_fill_bytes` function. This would entail doing all synchronisation on first +use (via a simple synchronisation primitive or thread-local memory), and +somehow adapting this and/or each `Rng`'s `new_from_rng` function to work +together (possibly replacing `new_from_rng(other_rng)` with a simple +`new_from_os()`). + +Contrary to the above, an implementation of `Rng` using only the OS may be +exactly what some users want, since this can be used just like any other `Rng`, +aside from the lower performance. + +### Traits + ### Generator adaptors `ReseedingRng` is a wrapper which periodically reseeds the enclosed RNG. @@ -531,7 +702,14 @@ sampling (`my_range.sample(&mut rng)`). Lots and lots; perhaps the biggest ones: -* TODO: generators +* Remove `next_f32` and `next_f64` from `Rng`? +* Add `next_u128` to `Rng`? +* Add a `CryptoRng` trait or otherwise split out cryptographic generators? +* Split the `rand` crate? +* Add `JumpableRng` trait? +* Add `InjectableRng` trait? +* Replace `OsRng` with an `os::try_fill_bytes` function? +* TODO: `thread_rng()`, `weak_rng()`, `StdRng` * Rename the `distributions` module? * Rename `Uniform`, `Default`, etc.? * Keep the parameterised `Rand`, replace with the simple `Rand` or From 593066aa7fe621ae81fe920f382bc8aa78bbd72d Mon Sep 17 00:00:00 2001 From: Diggory Hardy Date: Fri, 11 Aug 2017 13:00:58 +0100 Subject: [PATCH 07/18] Rand crate RFC: notes on testing and benchmarking, doc, and a few other bits --- text/0000-rand-crate-redesign.md | 81 +++++++++++++++++++++++++++++++- 1 file changed, 80 insertions(+), 1 deletion(-) diff --git a/text/0000-rand-crate-redesign.md b/text/0000-rand-crate-redesign.md index 0adae59a075..e53cc21562c 100644 --- a/text/0000-rand-crate-redesign.md +++ b/text/0000-rand-crate-redesign.md @@ -82,6 +82,10 @@ Since this concerns one (or more) crates outside the standard library, it is assumed that these crates should be self-documenting. Much of this documentation still needs writing, but must wait on design decisions. +In my opinion, [the extensive examples](https://docs.rs/rand/0.3.16/rand/#examples) +in the crate API documentation should be moved to a book or example project. +They are well written, but do not really belong in API documentation. + # Reference-level explanation [reference-level-explanation]: #reference-level-explanation @@ -360,6 +364,31 @@ pub trait SeedableRng: Rng { } ``` +### Testing generators + +Since most of the generators discussed are *deterministic*, this determinism +should be tested. Each generator should be required to have at least one test +setting a specific seed, then reproducing a sequence of values obtained either +from a specification of the PRNG algorithm or generated with a reference +implementation. + +The ChaCha and ISAAC generators already have such tests +(`test_rng*_true_values`), however the ISAAC variant does not document the +source. The XorShift generator currently has no such test. + +### Benchmarking generators + +It would be nice if the crate included the following information on each +generator in documentation and/or benchmarks: + +* state size in bytes +* initialisation time +* amortised time to generate 1 word of randomness (where "word" means the + native size of the generator, not of the machine) + +Currently there are benchmarks of generation time, but this *might* not truely +represent the amortised time due to infrequent update cycles. + ## Random values This section concerns creating random values of various types and with various @@ -651,6 +680,25 @@ The current `rand` has a sub-crate, [`rand_derive`] Probably something similar could be designed for [`Default`] or `Rand`, but due to the current author's lack of interest this has not been investigated. +### Testing distributions + +Each distribution should have a test feeding in a known sequence of "generated" +numbers, then test the exact output for each input, where the output may be +generated by the library itself but must be tested to be at least approximately +correct via external calculation. + +A generator returning a sequence of evenly spaced `u64` values should be +sufficient; output should include zero, `u64::MAX`, and +several values in between. + +### Benchmarking distributions + +Most distributions currently only apply to `f64`; for this type a baseline +sampling from `[0, 1)` and each distribution available should be tested, +each generating the same number of values. + +Benchmarks for other types could be added. + ## `no_std` support This was requested, and has been implemented in the refactor by hiding many @@ -700,7 +748,9 @@ sampling (`my_range.sample(&mut rng)`). # Unresolved questions [unresolved]: #unresolved-questions -Lots and lots; perhaps the biggest ones: +## Summary of above + +There are many questions above; perhaps the biggest ones are: * Remove `next_f32` and `next_f64` from `Rng`? * Add `next_u128` to `Rng`? @@ -716,17 +766,46 @@ Lots and lots; perhaps the biggest ones: remove both? * Replace [`range`] with [`range2`] ? +## Derive rand + The `derive_rand` sub-crate of the current `rand` provides another method to generate random values of the current type. This could probably be adjusted to derive `Rand` or maybe even support custom distributions. In the strawman design I simply deleted this sub-crate since I have no interest in creating random values this way. +## Misc. ranges + Should we introduce `RangeTo::new(high)` (same as `Range::new(0, high)`) to save extra add/subtract? No...? Maybe we should replace `Range` with `RangeInt`, `RangeFloat` etc.? +## Mapping to floats + +This article points out that the common method of generating floats in the +range `[0, 1)` or `(0, 1)` is wrong. It is worth pointing out that our existing +code *does not use this method*, however it may still be worth reading the +article: [Generating Pseudo-random Floating-Point +Values](https://readings.owlfolio.org/2007/generating-pseudorandom-floating-point-values/). + +## Only crypto-PRNGs + +[@zackw points out that the default random number generator should be secure](https://internals.rust-lang.org/t/crate-evaluation-for-2017-07-25-rand/5505/68): + +> I think it’s vital that the *default* primitive random number generator, the +> one you get if you don’t do anything special, is an automatically-seeded +> CSPRNG. This is because people will reach for the equivalent of C’s +> [rand(3)](http://man7.org/linux/man-pages/man3/rand.3.html) in situations +> where they *need* a CSPRNG but don’t realize it, such as generation of HTTP +> session cookies. Yes, there are *better* ways to generate HTTP session +> cookies, but if the rand-equivalent is an auto-seeded CSPRNG, the low bar +> goes from “catastrophically insecure” to “nontrivial to exploit,” and that’s +> valuable. + + +------------------------------------------------------------------------------- + [Crate evaluation thread]: https://internals.rust-lang.org/t/crate-evaluation-for-2017-07-25-rand/5505 [Strawman design PR]: https://github.com/rust-lang-nursery/rand/pull/161 [Strawman design doc]: https://dhardy.github.io/rand/rand/index.html From 06e302e896393f8023b1a29b29d1e368ed04130b Mon Sep 17 00:00:00 2001 From: Diggory Hardy Date: Fri, 11 Aug 2017 13:16:04 +0100 Subject: [PATCH 08/18] Rand crate RFC: text on creation of RNGs Unfortunately this section has as many questions as answers --- text/0000-rand-crate-redesign.md | 22 +++++++++++++++++++++- 1 file changed, 21 insertions(+), 1 deletion(-) diff --git a/text/0000-rand-crate-redesign.md b/text/0000-rand-crate-redesign.md index e53cc21562c..dc624d91652 100644 --- a/text/0000-rand-crate-redesign.md +++ b/text/0000-rand-crate-redesign.md @@ -102,7 +102,7 @@ covering deterministic seeding. It has been proposed to rename `Rng` to `Generator`; this proposal has not seen a lot of support. -### `Rng` +### `Rng` trait The `Rng` trait governs what types can be generated directly, and there appears to be consensus on removing all convenience functions not concerned with @@ -259,6 +259,26 @@ pub trait InjectableRng: Rng { These traits can be added in the future without breaking compatibility, however they may be worth discussing now. +### Creation of RNGs + +The `Rng` trait does not cover creation of new RNG objects. It is recommended +(but not required) that each RNG implement: + +* `pub fn new() -> Self`, setting a new seed obtained from... where? +* `pub fn new_from_rng(rng: &mut R) -> Self` +* `SeedableRng` for some type `Seed` + +Other constructors are discouraged; e.g. all current generators have a +`new_unseeded` function; realistically no one should use this except certain +tests, where `SeedableRng::from_seed(seed) -> Self` could be used instead. + +TODO: is `new_from_rng` a good idea? E.g. naively seeding from another +generator of the same type can unwittingly create a clone. + +TODO: what should be the usual story for creating a new RNG: should the seed +come from `thread_rng()` allowing deterministic operation or from `OsRng`, +ensuring "safe" seeding? + ## Generators This section concerns implementations of `Rng`; From 27965e557fcba3b3f76419751d3d1fb834e9529c Mon Sep 17 00:00:00 2001 From: Diggory Hardy Date: Fri, 11 Aug 2017 16:27:44 +0100 Subject: [PATCH 09/18] Rand crate RFC: various updates concerning generators --- text/0000-rand-crate-redesign.md | 147 +++++++++++++++++++------------ 1 file changed, 91 insertions(+), 56 deletions(-) diff --git a/text/0000-rand-crate-redesign.md b/text/0000-rand-crate-redesign.md index dc624d91652..cbe22ee4bb7 100644 --- a/text/0000-rand-crate-redesign.md +++ b/text/0000-rand-crate-redesign.md @@ -264,18 +264,21 @@ they may be worth discussing now. The `Rng` trait does not cover creation of new RNG objects. It is recommended (but not required) that each RNG implement: -* `pub fn new() -> Self`, setting a new seed obtained from... where? +* `pub fn new() -> Self`, taking a seed from... where? (See below) * `pub fn new_from_rng(rng: &mut R) -> Self` * `SeedableRng` for some type `Seed` +Note that the above won't be applicable to all `Rng` implementations; e.g. +`ReadRng` can only be constructed with a readable object. + Other constructors are discouraged; e.g. all current generators have a `new_unseeded` function; realistically no one should use this except certain tests, where `SeedableRng::from_seed(seed) -> Self` could be used instead. -TODO: is `new_from_rng` a good idea? E.g. naively seeding from another +**Question**: is `new_from_rng` a good idea? E.g. naively seeding from another generator of the same type can unwittingly create a clone. -TODO: what should be the usual story for creating a new RNG: should the seed +**Question**: what should be the usual story for creating a new RNG: should the seed come from `thread_rng()` allowing deterministic operation or from `OsRng`, ensuring "safe" seeding? @@ -291,40 +294,62 @@ In no particular order, this section attempts to cover: * requirements of generators included in this library * seeding PRNGs -### Current state of the `rand` crate +### PRNG algorithms Rand currently provides three PRNG algorithms, and a wrapper (`OsRng`) to OS-provided numbers. It also provides a couple of simple wrappers, and two traits, [`Rng` and `SeedableRng`](generation-API). -Three user-space RNGs are currently provided. Should this change? And should the -aim be to build a selection of high quality generators or keep the list short? -Are there any other RNGs worth adding now? - -* `IsaacRng` (32-bit) and `Isaac64Rng` -* `ChaChaRng` -* `XorShiftRng` - -The appropriate 32 or 64 variant of Isaac is exposed as `IsaacWordRng`. While -the concept is good, the name is weird. - -`StdRng` is currently a wrapper around `IsaacWordRng`, with a `new()` method -that seeds from `OsRng`. Possibly this should be replaced with two wrapper structs -or simply re-bound names: `FastRng` and `CryptoRng`. - -TODO: `ReadRng` - -`thread_rng()` current constructs a reference-counted periodically-reseeding -`StdRng` per thread on first use. TODO: allow user override via dynamic dispatch? -Rename to `crypto_rng()`? - -`weak_rng()` currently constructs a new `XorShiftRng` seeded via `OsRng` each -time it is called. Rename to `fast_rng()` and make it use a `FastRng` type? -What about `random()`, should for example the documentation point out that -creating a `weak_rng()` may be useful for performance where crypto-strength -generation is not needed? - -#### OS provision +* [`IsaacRng`] (32-bit) and [`Isaac64Rng`] +* [`ChaChaRng`] +* [`XorShiftRng`] + +**Question:** which of these should be kept in `rand`? Likely [`XorShiftRng`] +should be removed. + +**Question:** what other PRNGs should we consider adding — should we keep the +crate minimal or add several good generators? Should we add [`Xoroshiro128+`] +([Wikipedia article](https://en.wikipedia.org/wiki/Xoroshiro128%2B))? +Should we allow historical generators like MT19937? (Note that the question of +whether `rand` should be split into multiple crates affects this question.) + +There are a couple of wrapper/renamed implementations: + +* [`IsaacWordRng`] is [`Isaac64Rng`] on 64-bit pointer machines, [`IsaacRng`] on + 32-bit pointer machines [other name suggestions welcome] +* [`StdRng`] is currently a wrapper around [`IsaacWordRng`], with a `new()` + method that seeds from [`OsRng`]. Ideally this `new` behaviour should be + moved to the PRNG itself so that [`StdRng`] does not need the wrapping struct + (TODO). + +**Question:** should we rename `StdRng` to `FastRng` or `CryptoRng`? Should +we introduce another name? My understanding is that ISAAC is faster than ChaCha +and mostly secure, but has some weak states, and is therefore effectively a +compromise between a speedy generator like `XorShift` and a strong cryptographic +generator. Several people have suggested that even simulations not requiring +cryptographic generators should be using them for their stronger statistical +indepedence guarantees. This does not imply that non-cryptographic PRNGs have +no good uses (e.g. games usually do not require good quality generators). + +### Special `Rng` implementations + +[`ReadRng`] is an adaptor implementing `Rng` by returning data from any source +supporting `Read`, e.g. files. It is used by `OsRng` to read from +`/dev/urandom`. + +[`ReseedingRng`] is a wrapper which counts the number of bytes returned, and +after some threshold reseeds the enclosed generator. Its only real use would be +to periodically adjust a random number stream to recover entropy after loss +(e.g. a process fork) or where there was insufficient entropy to begin with +(seeding from the OS generator too soon after boot — but probably all OSs +deal with this problem anyway; e.g. Linux saves a "seed file", and Intel's +`RDRAND` instruction can be used as an extra source of entropy, even if not +trusted). + +Note that if an "entropy injection" trait can be added, we should use +that instead of reseeding from scratch. + +### OS provision `OsRng` currently implements `Rng` by directly sampling from whatever OS functionality is available. @@ -365,24 +390,15 @@ Contrary to the above, an implementation of `Rng` using only the OS may be exactly what some users want, since this can be used just like any other `Rng`, aside from the lower performance. -### Traits - -### Generator adaptors - -`ReseedingRng` is a wrapper which periodically reseeds the enclosed RNG. +### Convenience functions -Should a similar wrapper to periodically inject entropy from the OS be added? -Of course this shouldn't be necessary normally, but it might help when (a) the -initial OS-provided seed had little entropy and (b) cycles might otherwise occur. - -The `SeedableRng` trait is an optional extra allowing reseeding: +`thread_rng()` currently constructs a reference-counted periodically-reseeding +`StdRng` per thread on first use. TODO: allow user override via dynamic dispatch? +Rename to `crypto_rng()`? -``` -pub trait SeedableRng: Rng { - fn reseed(&mut self, _: Seed); - fn from_seed(seed: Seed) -> Self; -} -``` +`weak_rng()` currently constructs a new `XorShiftRng` seeded via `OsRng` each +time it is called. **Question:** can we replace this with `XorShiftRng::new()`, +or usage via a wrapper/new name: `FastRng::new()`? ### Testing generators @@ -409,6 +425,11 @@ generator in documentation and/or benchmarks: Currently there are benchmarks of generation time, but this *might* not truely represent the amortised time due to infrequent update cycles. +### Notes + +Should we worry about generator algorithms not *exactly* matching the domain +(`u32` or `u64`)? For example, Xoroshiro apparently never generates zero. + ## Random values This section concerns creating random values of various types and with various @@ -570,6 +591,17 @@ data access). Most distributions are implemented in public sub-modules, then *also* imported into `distributions` via `pub use`. Possibly the sub-modules should be hidden. +#### Notes on conversion to floating point + +This article points out that the common method of generating floats in the +range `[0, 1)` or `(0, 1)` is wrong. It is worth pointing out that our existing +code *does not use this method*, however it may still be worth reading the +article: [Generating Pseudo-random Floating-Point +Values](https://readings.owlfolio.org/2007/generating-pseudorandom-floating-point-values/). + +[More on the topic here](http://xoroshiro.di.unimi.it/), under the heading +"Generating uniform doubles ...". + ### `Rand` vs `Distribution` As suggested above, both `Rand` traits are basically wrappers around @@ -778,6 +810,8 @@ There are many questions above; perhaps the biggest ones are: * Split the `rand` crate? * Add `JumpableRng` trait? * Add `InjectableRng` trait? +* Which constructors should `Rng` impls have? +* Which `Rng` implementations should `rand` contain? * Replace `OsRng` with an `os::try_fill_bytes` function? * TODO: `thread_rng()`, `weak_rng()`, `StdRng` * Rename the `distributions` module? @@ -801,14 +835,6 @@ extra add/subtract? No...? Maybe we should replace `Range` with `RangeInt`, `RangeFloat` etc.? -## Mapping to floats - -This article points out that the common method of generating floats in the -range `[0, 1)` or `(0, 1)` is wrong. It is worth pointing out that our existing -code *does not use this method*, however it may still be worth reading the -article: [Generating Pseudo-random Floating-Point -Values](https://readings.owlfolio.org/2007/generating-pseudorandom-floating-point-values/). - ## Only crypto-PRNGs [@zackw points out that the default random number generator should be secure](https://internals.rust-lang.org/t/crate-evaluation-for-2017-07-25-rand/5505/68): @@ -848,3 +874,12 @@ Values](https://readings.owlfolio.org/2007/generating-pseudorandom-floating-poin [`range`]: https://dhardy.github.io/rand/rand/distributions/range/index.html [`range2`]: https://dhardy.github.io/rand/rand/distributions/range2/index.html [`distributions`]: (https://dhardy.github.io/rand/rand/distributions/index.html) +[`ReadRng`]: https://dhardy.github.io/rand/rand/struct.ReadRng.html +[`ReseedingRng`]: https://dhardy.github.io/rand/rand/reseeding/struct.ReseedingRng.html +[`IsaacRng`]: https://dhardy.github.io/rand/rand/prng/struct.IsaacRng.html +[`Isaac64Rng`]: https://dhardy.github.io/rand/rand/prng/struct.Isaac64Rng.html +[`IsaacWordRng`]: https://dhardy.github.io/rand/rand/prng/struct.IsaacWordRng.html +[`ChaChaRng`]: https://dhardy.github.io/rand/rand/prng/struct.ChaChaRng.html +[`XorShiftRng`]: https://dhardy.github.io/rand/rand/prng/struct.XorShiftRng.html +[`Xoroshiro128+`]: http://xoroshiro.di.unimi.it/ +[`StdRng`]: https://dhardy.github.io/rand/rand/struct.StdRng.html From 182cc06d49e1e83d10938b85b0d9a52f67caf282 Mon Sep 17 00:00:00 2001 From: Diggory Hardy Date: Mon, 14 Aug 2017 12:57:45 +0100 Subject: [PATCH 10/18] Rand crate RFC: updates, especially concerning generators --- text/0000-rand-crate-redesign.md | 129 +++++++++++++++++++++++++------ 1 file changed, 107 insertions(+), 22 deletions(-) diff --git a/text/0000-rand-crate-redesign.md b/text/0000-rand-crate-redesign.md index cbe22ee4bb7..7138a525d4f 100644 --- a/text/0000-rand-crate-redesign.md +++ b/text/0000-rand-crate-redesign.md @@ -189,6 +189,7 @@ generators](https://internals.rust-lang.org/t/crate-evaluation-for-2017-07-25-ra where possible. Further, keeping cryptography-related crates small may be useful for security reviews. +**Question:** If `rand`/`Rng` is split into multiple parts, several approaches are possible; what to do here is very much undecided. @@ -221,6 +222,14 @@ Second, the `rand` crate could: **Current preference:** undecided +### Debug + +The strawman design now specifies `pub trait Rng: Debug {...}`, i.e. requires +all implementations to implement `Debug`. In many cases this requires only +that implementing types are prefixed with `#[derive(Debug)]`, although in some +cases it adds requirements on internal values. + +Is this useful? ### Extension traits @@ -260,6 +269,7 @@ These traits can be added in the future without breaking compatibility, however they may be worth discussing now. ### Creation of RNGs +[creation-of-rngs]: #creation-of-rngs The `Rng` trait does not cover creation of new RNG objects. It is recommended (but not required) that each RNG implement: @@ -298,32 +308,48 @@ In no particular order, this section attempts to cover: Rand currently provides three PRNG algorithms, and a wrapper (`OsRng`) to OS-provided numbers. It also provides a couple of simple wrappers, and two -traits, [`Rng` and `SeedableRng`](generation-API). +traits, [`Rng` and `SeedableRng`](generation-API). The included PRNGs are: -* [`IsaacRng`] (32-bit) and [`Isaac64Rng`] -* [`ChaChaRng`] -* [`XorShiftRng`] +* [`IsaacRng`] (32-bit) and [`Isaac64Rng`]; a very fast cryptographic + generator, but with potential attacks on weak states +* [`ChaChaRng`]; a cryptographic generator used by Linux and by Google for TLS, + among other uses +* [`XorShiftRng`]; a very fast generator, generally inferior to Xoroshiro **Question:** which of these should be kept in `rand`? Likely [`XorShiftRng`] -should be removed. +should be removed, since `Xoroshiro` is generally superior. **Question:** what other PRNGs should we consider adding — should we keep the -crate minimal or add several good generators? Should we add [`Xoroshiro128+`] -([Wikipedia article](https://en.wikipedia.org/wiki/Xoroshiro128%2B))? -Should we allow historical generators like MT19937? (Note that the question of +crate minimal or add several good generators? (Note that the question of whether `rand` should be split into multiple crates affects this question.) +* Should we add [`Xoroshiro128+`] as a replacement for XorShift? + ([Wikipedia article](https://en.wikipedia.org/wiki/Xoroshiro128%2B)) +* Should we add implementations for other promising crypto-PRNGs, e.g. + Fortuna/Yarrow (apparently used by *BSD, OSX, and iOS)? +* Should we add an implementation of [`RDRAND`]? This is supposed to be + secure and fast, but not everyone trusts closed-source hardware, and it may + not be the fastest. If we do, how should we handle platforms without this + feature? +* Wikipedia [mentions an improved `ISAAC+` variant](https://en.wikipedia.org/wiki/ISAAC_(cipher)#Cryptanalysis) of ISAAC; what is this? +* The [eSTREAM project](http://www.ecrypt.eu.org/stream/) + ([Wiki article](https://en.wikipedia.org/wiki/ESTREAM)) selected several + new stream cipher algorithms; these should all be usable as crypto PRNGs. +* If the rand crate is split somehow or a "rand_extra" crate added, should + this accept good quality implementations of any known PRNG? + There are a couple of wrapper/renamed implementations: * [`IsaacWordRng`] is [`Isaac64Rng`] on 64-bit pointer machines, [`IsaacRng`] on 32-bit pointer machines [other name suggestions welcome] * [`StdRng`] is currently a wrapper around [`IsaacWordRng`], with a `new()` method that seeds from [`OsRng`]. Ideally this `new` behaviour should be - moved to the PRNG itself so that [`StdRng`] does not need the wrapping struct - (TODO). + moved to the PRNG itself (see [creation-of-rngs]); in this case `SndRng` + could just be a new type name, not a wrapper -**Question:** should we rename `StdRng` to `FastRng` or `CryptoRng`? Should -we introduce another name? My understanding is that ISAAC is faster than ChaCha +**Question:** should we rename `StdRng` to `FastRng` or `CryptoRng`, or perhaps +have `CryptoRng` for the ChaCha generator and `FastRng` for either the +Xoroshift or ISAAC generators? My understanding is that ISAAC is faster than ChaCha and mostly secure, but has some weak states, and is therefore effectively a compromise between a speedy generator like `XorShift` and a strong cryptographic generator. Several people have suggested that even simulations not requiring @@ -333,6 +359,13 @@ no good uses (e.g. games usually do not require good quality generators). ### Special `Rng` implementations +[`ConstRng`] is a special implementation yielding a given constant repeatedly. +It is sometimes useful for testing. + +**Question:** are there any good reasons *not* to include [`ConstRng`]? Or +perhaps `rand` should also include a looping-sequence generator or a counting +generator, or a generator based on an iterator? + [`ReadRng`] is an adaptor implementing `Rng` by returning data from any source supporting `Read`, e.g. files. It is used by `OsRng` to read from `/dev/urandom`. @@ -392,13 +425,44 @@ aside from the lower performance. ### Convenience functions -`thread_rng()` currently constructs a reference-counted periodically-reseeding -`StdRng` per thread on first use. TODO: allow user override via dynamic dispatch? -Rename to `crypto_rng()`? - -`weak_rng()` currently constructs a new `XorShiftRng` seeded via `OsRng` each -time it is called. **Question:** can we replace this with `XorShiftRng::new()`, -or usage via a wrapper/new name: `FastRng::new()`? +`thread_rng` is a function returning a reference to an automatically +initialised thread-local generator. + +In the current `rand` crate, [`thread_rng`](https://docs.rs/rand/0.3.16/rand/fn.thread_rng.html) constructs a reference-counted +periodically-reseeding [`StdRng`] (ISAAC) per thread on first use. This is +"reasonably fast" and "reasonably secure", which can be viewed either as a good +compromise or as combining the worst aspects of both options — is this a good +default generator? + +One school of thought is that the default generator should be [`OsRng`], thus +aiming for maximal security and letting users deal with performance if and only +if that is a problem. In light of this, the strawman revision uses [`OsRng`] as +the default, but allowing the generator used by [`thread_rng`] to be replaced +on a per-thread basis: + +* [`thread_rng`] returns a reference to a boxed `Rng` (using dynamic dispatch) +* [`set_thread_rng`] replaces the `Rng` used by the current thread +* [`set_new_thread_rng`] changes how new thread-RNGs are created + +Note that the ability to override the generator used by `thread_rng` has certain +uses in testing, limited uses in improving performance (for ultimate +performance and for reproducibility, and may be preferable to avoid using +`thread_rng` at all), and has security implications, in that libraries cannot +rely on `thread_rng` being secure due to binaries and other libraries having +the option to replace the generator. It may therefore be better not to allow +override and possibly to use a "fast/secure compromise" PRNG like the current +`rand` crate. + +In the current `rand` crate, a second convenience generator is available: +[`weak_rng`] constructs a new `XorShiftRng` seeded via `OsRng` each +time it is called. (The `SomeRng::new()` syntax can replace the need for +this type of function; see [creation-of-rngs].) The strawman revision simply +removes `weak_rng`. + +Two functions using [`thread_rng`] are included: + +* [`random`], generating random values via the default distribution +* [`random_with`], generating random values via a specified distribution ### Testing generators @@ -602,6 +666,11 @@ Values](https://readings.owlfolio.org/2007/generating-pseudorandom-floating-poin [More on the topic here](http://xoroshiro.di.unimi.it/), under the heading "Generating uniform doubles ...". +#### Testing distributions + +Distributions should test exact output with several specified inputs, via usage +of [`ConstRng`] or similar. (TODO: implement such tests.) + ### `Rand` vs `Distribution` As suggested above, both `Rand` traits are basically wrappers around @@ -802,7 +871,9 @@ sampling (`my_range.sample(&mut rng)`). ## Summary of above -There are many questions above; perhaps the biggest ones are: +There are many questions above; perhaps the biggest ones are as follows. + +API: * Remove `next_f32` and `next_f64` from `Rng`? * Add `next_u128` to `Rng`? @@ -811,9 +882,17 @@ There are many questions above; perhaps the biggest ones are: * Add `JumpableRng` trait? * Add `InjectableRng` trait? * Which constructors should `Rng` impls have? -* Which `Rng` implementations should `rand` contain? + +Generators: + +* Which PRNGs `rand` contain? +* Which testing `Rng` impls, if any, should `rand` contain? * Replace `OsRng` with an `os::try_fill_bytes` function? -* TODO: `thread_rng()`, `weak_rng()`, `StdRng` +* Allow generator behind `thread_rng` to be switched? +* Include `StdRng` and `weak_rng` wrappers? + +Distributions: + * Rename the `distributions` module? * Rename `Uniform`, `Default`, etc.? * Keep the parameterised `Rand`, replace with the simple `Rand` or @@ -883,3 +962,9 @@ Maybe we should replace `Range` with `RangeInt`, `RangeFloat` etc.? [`XorShiftRng`]: https://dhardy.github.io/rand/rand/prng/struct.XorShiftRng.html [`Xoroshiro128+`]: http://xoroshiro.di.unimi.it/ [`StdRng`]: https://dhardy.github.io/rand/rand/struct.StdRng.html +[`RDRAND`]: https://en.wikipedia.org/wiki/RdRand +[`ConstRng`]: https://dhardy.github.io/rand/rand/struct.ConstRng.html +[`thread_rng`]: https://dhardy.github.io/rand/rand/fn.thread_rng.html +[`set_thread_rng`]: https://dhardy.github.io/rand/rand/fn.set_thread_rng.html +[`set_new_thread_rng`]: https://dhardy.github.io/rand/rand/fn.set_new_thread_rng.html +[`OsRng`]: https://dhardy.github.io/rand/rand/struct.OsRng.html From cd4fd36c45964e9137e8ed7c5a54648eafdd9b14 Mon Sep 17 00:00:00 2001 From: Diggory Hardy Date: Mon, 14 Aug 2017 13:12:01 +0100 Subject: [PATCH 11/18] Rand crate RFC: tidy up a few misc points --- text/0000-rand-crate-redesign.md | 74 ++++++++++++++++++-------------- 1 file changed, 41 insertions(+), 33 deletions(-) diff --git a/text/0000-rand-crate-redesign.md b/text/0000-rand-crate-redesign.md index 7138a525d4f..bc6238d6746 100644 --- a/text/0000-rand-crate-redesign.md +++ b/text/0000-rand-crate-redesign.md @@ -316,13 +316,13 @@ traits, [`Rng` and `SeedableRng`](generation-API). The included PRNGs are: among other uses * [`XorShiftRng`]; a very fast generator, generally inferior to Xoroshiro -**Question:** which of these should be kept in `rand`? Likely [`XorShiftRng`] -should be removed, since `Xoroshiro` is generally superior. - -**Question:** what other PRNGs should we consider adding — should we keep the -crate minimal or add several good generators? (Note that the question of -whether `rand` should be split into multiple crates affects this question.) +**Question:** should any of the above be removed? What other PRNGs should we +consider adding — should we keep the crate minimal or add several good +generators? (Note that the questions of whether `rand` should be split into +multiple crates and whether a separate crypto trait should be added affects +this question.) [bhickey has some thoughts on this.](https://github.com/rust-lang-nursery/rand/pull/161#issuecomment-320483055). +* Likely [`XorShiftRng`] should be removed, since `Xoroshiro` is generally superior. * Should we add [`Xoroshiro128+`] as a replacement for XorShift? ([Wikipedia article](https://en.wikipedia.org/wiki/Xoroshiro128%2B)) * Should we add implementations for other promising crypto-PRNGs, e.g. @@ -434,6 +434,18 @@ periodically-reseeding [`StdRng`] (ISAAC) per thread on first use. This is compromise or as combining the worst aspects of both options — is this a good default generator? +[@zackw points out that the default random number generator should be secure](https://internals.rust-lang.org/t/crate-evaluation-for-2017-07-25-rand/5505/68): + +> I think it’s vital that the *default* primitive random number generator, the +> one you get if you don’t do anything special, is an automatically-seeded +> CSPRNG. This is because people will reach for the equivalent of C’s +> [rand(3)](http://man7.org/linux/man-pages/man3/rand.3.html) in situations +> where they *need* a CSPRNG but don’t realize it, such as generation of HTTP +> session cookies. Yes, there are *better* ways to generate HTTP session +> cookies, but if the rand-equivalent is an auto-seeded CSPRNG, the low bar +> goes from “catastrophically insecure” to “nontrivial to exploit,” and that’s +> valuable. + One school of thought is that the default generator should be [`OsRng`], thus aiming for maximal security and letting users deal with performance if and only if that is a problem. In light of this, the strawman revision uses [`OsRng`] as @@ -453,6 +465,9 @@ the option to replace the generator. It may therefore be better not to allow override and possibly to use a "fast/secure compromise" PRNG like the current `rand` crate. +Another school of thought is that [`thread_rng` and `random` etc. should not be +included at all](https://internals.rust-lang.org/t/crate-evaluation-for-2017-07-25-rand/5505/82). + In the current `rand` crate, a second convenience generator is available: [`weak_rng`] constructs a new `XorShiftRng` seeded via `OsRng` each time it is called. (The `SomeRng::new()` syntax can replace the need for @@ -624,6 +639,8 @@ leaves two semantic issues: the range differs by type, and some possible type-dependent implementations (such as for `Option`) cannot practically have uniform distribution. +#### Range + There is one further uniform distribution: * [`Range`] specifies uniform distribution over a range `[a, b)` and supports @@ -637,7 +654,12 @@ but has some drawbacks, perhaps most notably that `Range` is parameterised so that `Range::new(low, high)` must be replaced with `new_range(low, high)` or `Range::::new(low, high)`. -Finally, there are several more [`distributions`], +Possibly the current `range` function should be removed, then `new_range` from +[`range2`] and an equivalent for [`Range`] could be named `range`. + +#### Non-uniform distributions + +Finally, there are several [`distributions`] unchanged from the current `rand`: * `Exp` @@ -655,9 +677,17 @@ data access). Most distributions are implemented in public sub-modules, then *also* imported into `distributions` via `pub use`. Possibly the sub-modules should be hidden. -#### Notes on conversion to floating point +#### Conversion to floating point + +Currently this is implemented via `impl Distribution for Uniform01` and +the `f64` equivalent in the strawman refactor, and within the `Rng` trait in +the current `rand`. It has been suggested that this should be implemented in +a simple function (used by `Rand` / `Uniform01`) so that users only wanting to +use a small subset of the library for cryptography do not need to use the +distributions code. This is only really useful if the `rand` crate is split +into a minimal crypto sub-set and the rest building on that. -This article points out that the common method of generating floats in the +The following article points out that the common method of generating floats in the range `[0, 1)` or `(0, 1)` is wrong. It is worth pointing out that our existing code *does not use this method*, however it may still be worth reading the article: [Generating Pseudo-random Floating-Point @@ -724,14 +754,14 @@ let a: i64 = rand::random(); ### Convenience functions and more distributions -The above examples all get randomness from `thread_rng()`. For this case, two +The above examples all get randomness from [`thread_rng`]. For this case, two convenience functions are available: * [`random`], essentially `fn random() { Default.sample(&mut thread_rng()) }` * [`random_with`], essentially `fn random_with(distr: D) { distr.sample(&mut thread_rng()) }` -These do not require a `Rand` trait. Since calling `thread_rng()` has a little +These do not require a [`Rand`] trait. Since calling [`thread_rng`] has a little overhead, these functions are slightly inefficient when called multiple times. Additionally, within the `distributions` module, some more convenience functions @@ -907,28 +937,6 @@ derive `Rand` or maybe even support custom distributions. In the strawman design I simply deleted this sub-crate since I have no interest in creating random values this way. -## Misc. ranges - -Should we introduce `RangeTo::new(high)` (same as `Range::new(0, high)`) to save -extra add/subtract? No...? - -Maybe we should replace `Range` with `RangeInt`, `RangeFloat` etc.? - -## Only crypto-PRNGs - -[@zackw points out that the default random number generator should be secure](https://internals.rust-lang.org/t/crate-evaluation-for-2017-07-25-rand/5505/68): - -> I think it’s vital that the *default* primitive random number generator, the -> one you get if you don’t do anything special, is an automatically-seeded -> CSPRNG. This is because people will reach for the equivalent of C’s -> [rand(3)](http://man7.org/linux/man-pages/man3/rand.3.html) in situations -> where they *need* a CSPRNG but don’t realize it, such as generation of HTTP -> session cookies. Yes, there are *better* ways to generate HTTP session -> cookies, but if the rand-equivalent is an auto-seeded CSPRNG, the low bar -> goes from “catastrophically insecure” to “nontrivial to exploit,” and that’s -> valuable. - - ------------------------------------------------------------------------------- [Crate evaluation thread]: https://internals.rust-lang.org/t/crate-evaluation-for-2017-07-25-rand/5505 From 82ff63476e7c51011ab6a6b51d846ab98a334e73 Mon Sep 17 00:00:00 2001 From: Diggory Hardy Date: Mon, 14 Aug 2017 16:20:03 +0100 Subject: [PATCH 12/18] Rand crate RFC: update motivation --- text/0000-rand-crate-redesign.md | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/text/0000-rand-crate-redesign.md b/text/0000-rand-crate-redesign.md index bc6238d6746..6e2c1cdf62d 100644 --- a/text/0000-rand-crate-redesign.md +++ b/text/0000-rand-crate-redesign.md @@ -9,7 +9,7 @@ Evaluate options for the future of `rand` regarding both cryptographic and non-cryptographic uses. -There is a strawman design which implements or demonstrates many of the changes +There is a strawman revision which implements or demonstrates many of the changes suggested in this document, but is not entirely in-sync with the suggestions here. @@ -34,6 +34,15 @@ approach to random number generation is sufficient (*good enough*) or whether splitting the crate into two is the better option: one focussed on cryptography, the other on numerical applications. +At the finer level, there are many questions, e.g. which members the `Rng` +trait needs, what the `Rand` trait should look like, which PRNG algorithms +should be included, what `thread_rng` should do, and so on. + +Once some progress has been made to answering these questions, I will update +the strawman revision accordingly, and if necessary split the corresponding PR +into reviewable chunks. Once this is done, we can talk about stabilising parts +of `rand`. + ## Background [background]: #background @@ -224,7 +233,7 @@ Second, the `rand` crate could: ### Debug -The strawman design now specifies `pub trait Rng: Debug {...}`, i.e. requires +The strawman revision now specifies `pub trait Rng: Debug {...}`, i.e. requires all implementations to implement `Debug`. In many cases this requires only that implementing types are prefixed with `#[derive(Debug)]`, although in some cases it adds requirements on internal values. @@ -518,7 +527,7 @@ Most of this functionality is contained in the [`distributions`] module. (For a time this module was renamed `dist` for brevity, but renamed back to avoid confusion. `distr` might be another possibility.) -The strawman design showcases two traits for generating random values of the +The strawman revision showcases two traits for generating random values of the current type, the [`Rand`] trait and [`SimpleRand`]. It is the intention to only keep one of these, and name whichever remains `Rand`. The first, (currently named) [`Rand`], supports parameterisation by a distribution, thus giving @@ -680,7 +689,7 @@ into `distributions` via `pub use`. Possibly the sub-modules should be hidden. #### Conversion to floating point Currently this is implemented via `impl Distribution for Uniform01` and -the `f64` equivalent in the strawman refactor, and within the `Rng` trait in +the `f64` equivalent in the strawman revision, and within the `Rng` trait in the current `rand`. It has been suggested that this should be implemented in a simple function (used by `Rand` / `Uniform01`) so that users only wanting to use a small subset of the library for cryptography do not need to use the @@ -934,8 +943,8 @@ Distributions: The `derive_rand` sub-crate of the current `rand` provides another method to generate random values of the current type. This could probably be adjusted to derive `Rand` or maybe even support custom distributions. In the -strawman design I simply deleted this sub-crate since I have no interest in -creating random values this way. +strawman revision I simply deleted this sub-crate since I have no personal +interest in creating random values this way. ------------------------------------------------------------------------------- From 08fc4c4b122a195038143c1951cfe9cb35935810 Mon Sep 17 00:00:00 2001 From: Diggory Hardy Date: Mon, 14 Aug 2017 16:24:45 +0100 Subject: [PATCH 13/18] Rand crate RFC: fix links and tweak text --- text/0000-rand-crate-redesign.md | 43 +++++++++++++++++++------------- 1 file changed, 26 insertions(+), 17 deletions(-) diff --git a/text/0000-rand-crate-redesign.md b/text/0000-rand-crate-redesign.md index 6e2c1cdf62d..6a3b1cd74f8 100644 --- a/text/0000-rand-crate-redesign.md +++ b/text/0000-rand-crate-redesign.md @@ -16,8 +16,9 @@ here. See also: * [Crate evaluation thread] -* [Strawman design PR] -* [Strawman design doc] +* [Strawman revision PR] +* [Strawman revision code] +* [Strawman revision doc] # Introduction [introduction]: #introduction @@ -26,7 +27,7 @@ See also: [motivation]: #motivation The [crate evaluation thread] brought up the issue of stabilisation of the `rand` -crate, however there appears to be widespread dissatisfaction with the current +crate, however there appears to be significant dissatisfaction with the current design. This RFC looks at a number of ways that this crate can be improved. The most fundamental question still to answer is whether a one-size-fits-all @@ -40,8 +41,9 @@ should be included, what `thread_rng` should do, and so on. Once some progress has been made to answering these questions, I will update the strawman revision accordingly, and if necessary split the corresponding PR -into reviewable chunks. Once this is done, we can talk about stabilising parts -of `rand`. +into reviewable chunks. Each chunk can have its own PR discussion as necessary. +Once this is done, we can talk about stabilising parts of `rand` in a separate +PR or RFC. ## Background [background]: #background @@ -214,8 +216,6 @@ First, "strong RNGs" designed for cryptographic usage could: crypto libraries, if at all) [practically, this is equivalent to 3 aside from choice of generators to include] -**Current preference:** undecided, possibly 3 - Second, the `rand` crate could: 1. Remain a single crate @@ -229,7 +229,9 @@ Second, the `rand` crate could: `rand` crate names) 5. Split into even more crates... -**Current preference:** undecided +*Personally,* I feel that we should stick to a single `Rng` trait as above. I +am not against splitting the implementation into multiple crates, but see +little benefit (a split *might* facilitate review of cryptographic code). ### Debug @@ -283,23 +285,28 @@ they may be worth discussing now. The `Rng` trait does not cover creation of new RNG objects. It is recommended (but not required) that each RNG implement: -* `pub fn new() -> Self`, taking a seed from... where? (See below) +* `pub fn new() -> Self`, taking a seed from [`OsRng`], see below * `pub fn new_from_rng(rng: &mut R) -> Self` * `SeedableRng` for some type `Seed` Note that the above won't be applicable to all `Rng` implementations; e.g. `ReadRng` can only be constructed with a readable object. -Other constructors are discouraged; e.g. all current generators have a +Other constructors should be discouraged; e.g. all current generators have a `new_unseeded` function; realistically no one should use this except certain tests, where `SeedableRng::from_seed(seed) -> Self` could be used instead. **Question**: is `new_from_rng` a good idea? E.g. naively seeding from another generator of the same type can unwittingly create a clone. -**Question**: what should be the usual story for creating a new RNG: should the seed -come from `thread_rng()` allowing deterministic operation or from `OsRng`, -ensuring "safe" seeding? +**Question**: should be the usual story for creating a new RNG be to use +`OsRng`? This has two advantages: (1) avoids the possibility of issues where +constructing sub-generators compromises the security of generation (or even +leads to multiple generators returning the same values), and (2) allows new +generators to be well-seeded even when the parent has been replaced via +`set_thread_rng` or lost entropy due to entropy-deprevation-attack; however +this also has a disadvantage: it doesn't use `thread_rng` so cannot be made +deterministic for testing purposes. ## Generators @@ -311,13 +318,14 @@ In no particular order, this section attempts to cover: * which generators should be provided by this library * the story for generators in other crates * requirements of generators included in this library -* seeding PRNGs +* benchmarks, testing and documentation on included generators +* convenience new types and functions ### PRNG algorithms Rand currently provides three PRNG algorithms, and a wrapper (`OsRng`) to OS-provided numbers. It also provides a couple of simple wrappers, and two -traits, [`Rng` and `SeedableRng`](generation-API). The included PRNGs are: +traits, [`Rng` and `SeedableRng`](#generation-API). The included PRNGs are: * [`IsaacRng`] (32-bit) and [`Isaac64Rng`]; a very fast cryptographic generator, but with potential attacks on weak states @@ -949,8 +957,9 @@ interest in creating random values this way. ------------------------------------------------------------------------------- [Crate evaluation thread]: https://internals.rust-lang.org/t/crate-evaluation-for-2017-07-25-rand/5505 -[Strawman design PR]: https://github.com/rust-lang-nursery/rand/pull/161 -[Strawman design doc]: https://dhardy.github.io/rand/rand/index.html +[Strawman revision PR]: https://github.com/rust-lang-nursery/rand/pull/161 +[Strawman revision code]: https://github.com/dhardy/rand +[Strawman revision doc]: https://dhardy.github.io/rand/rand/index.html [`Rand`]: https://dhardy.github.io/rand/rand/distributions/trait.Rand.html [`SimpleRand`]: https://dhardy.github.io/rand/rand/distributions/trait.SimpleRand.html [`Distribution`]: https://dhardy.github.io/rand/rand/distributions/trait.Distribution.html From 5d3dc1762fa9ebd8bde5d8207782fcc37688706c Mon Sep 17 00:00:00 2001 From: Diggory Hardy Date: Mon, 14 Aug 2017 17:33:08 +0100 Subject: [PATCH 14/18] Rand crate RFC: minor tweaks and link fixes --- text/0000-rand-crate-redesign.md | 23 +++++++++-------------- 1 file changed, 9 insertions(+), 14 deletions(-) diff --git a/text/0000-rand-crate-redesign.md b/text/0000-rand-crate-redesign.md index 6a3b1cd74f8..b12e66f45bf 100644 --- a/text/0000-rand-crate-redesign.md +++ b/text/0000-rand-crate-redesign.md @@ -4,7 +4,6 @@ - Rust Issue: (leave this empty) # Summary -[summary]: #summary Evaluate options for the future of `rand` regarding both cryptographic and non-cryptographic uses. @@ -21,10 +20,8 @@ See also: * [Strawman revision doc] # Introduction -[introduction]: #introduction ## Motivation -[motivation]: #motivation The [crate evaluation thread] brought up the issue of stabilisation of the `rand` crate, however there appears to be significant dissatisfaction with the current @@ -46,7 +43,6 @@ Once this is done, we can talk about stabilising parts of `rand` in a separate PR or RFC. ## Background -[background]: #background A *Pseudo-Random Number Generator*, abbreviated PRNG or simply RNG, is a deterministic algorithm for generating *random-like* numbers. Such algorithms @@ -87,7 +83,6 @@ L'Ecuyer provides some more background on PRNGs in [Random Number Generation](https://scholar.google.com/citations?view_op=view_citation&hl=en&user=gDxovowAAAAJ&citation_for_view=gDxovowAAAAJ:d1gkVwhDpl0C) (Springer, 2012). ## Guide-level explanation -[guide-level-explanation]: #guide-level-explanation Since this concerns one (or more) crates outside the standard library, it is assumed that these crates should be self-documenting. Much of this documentation @@ -98,10 +93,8 @@ in the crate API documentation should be moved to a book or example project. They are well written, but do not really belong in API documentation. # Reference-level explanation -[reference-level-explanation]: #reference-level-explanation ## Generation API -[generation-API]: #generation-API This section concerns the `Rng` trait and extension traits, but not specifically implementations or generation of values of other types. @@ -280,7 +273,6 @@ These traits can be added in the future without breaking compatibility, however they may be worth discussing now. ### Creation of RNGs -[creation-of-rngs]: #creation-of-rngs The `Rng` trait does not cover creation of new RNG objects. It is recommended (but not required) that each RNG implement: @@ -308,10 +300,15 @@ generators to be well-seeded even when the parent has been replaced via this also has a disadvantage: it doesn't use `thread_rng` so cannot be made deterministic for testing purposes. +**Question**: should `new` return a `Result`? In theory `OsRng` can fail to be +created, but on most platforms creation will always succeed; on the other hand +a panic during usage is a little more likely, but a panic is more difficult to +capture. + ## Generators This section concerns implementations of `Rng`; -[the API is discussed above](generation-API). +[the API is discussed above](#generation-api). In no particular order, this section attempts to cover: @@ -325,7 +322,7 @@ In no particular order, this section attempts to cover: Rand currently provides three PRNG algorithms, and a wrapper (`OsRng`) to OS-provided numbers. It also provides a couple of simple wrappers, and two -traits, [`Rng` and `SeedableRng`](#generation-API). The included PRNGs are: +traits, [`Rng` and `SeedableRng`](#generation-api). The included PRNGs are: * [`IsaacRng`] (32-bit) and [`Isaac64Rng`]; a very fast cryptographic generator, but with potential attacks on weak states @@ -432,13 +429,11 @@ On the other hand, since the primary use of `OsRng` is to seed another RNG after initialisation, it might be preferable to replace `OsRng` with a simple `try_fill_bytes` function. This would entail doing all synchronisation on first use (via a simple synchronisation primitive or thread-local memory), and -somehow adapting this and/or each `Rng`'s `new_from_rng` function to work -together (possibly replacing `new_from_rng(other_rng)` with a simple -`new_from_os()`). +adapting each `Rng`'s `fn new() -> Self` function. Contrary to the above, an implementation of `Rng` using only the OS may be exactly what some users want, since this can be used just like any other `Rng`, -aside from the lower performance. +aside from the lower performance; **probably [`OsRng`] will stay as-is**. ### Convenience functions From 996fad3bdc5a3160bc88f950d31ad09af7bfab1d Mon Sep 17 00:00:00 2001 From: Diggory Hardy Date: Thu, 17 Aug 2017 10:30:08 +0100 Subject: [PATCH 15/18] Rand crate RFC: updates regarding creation and seeding of PRNGs --- text/0000-rand-crate-redesign.md | 47 ++++++++++++++------------------ 1 file changed, 21 insertions(+), 26 deletions(-) diff --git a/text/0000-rand-crate-redesign.md b/text/0000-rand-crate-redesign.md index b12e66f45bf..d97d2710688 100644 --- a/text/0000-rand-crate-redesign.md +++ b/text/0000-rand-crate-redesign.md @@ -92,6 +92,13 @@ In my opinion, [the extensive examples](https://docs.rs/rand/0.3.16/rand/#exampl in the crate API documentation should be moved to a book or example project. They are well written, but do not really belong in API documentation. +## Note on type parameters + +Very often we make use of type parameters with a restriction like +`R: Rng + ?Sized`. This is *almost* the same as just `R: Rng`, except for one +thing: `R: Rng` doesn't work for dynamic dispatch (where the parameter is a +trait object like `let mut rng: &mut Rng = &mut thread_rng();`). + # Reference-level explanation ## Generation API @@ -259,15 +266,9 @@ pub trait JumpableRng: Rng { } ``` -And a trait could allow entropy injection. (Does the type `T` need to be a -template parameter?) - -```rust -pub trait InjectableRng: Rng { - /// Add entropy to the RNG, adjusting the sequence of numbers produced. - fn inject(&mut self, entropy: T); -} -``` +And a trait could allow entropy injection, however I don't believe this belongs +in `rand`. See [suggested trait](https://github.com/rust-lang/rfcs/pull/2106#issuecomment-322414869) +and [my thoughts](https://github.com/rust-lang/rfcs/pull/2106#issuecomment-322705482). These traits can be added in the future without breaking compatibility, however they may be worth discussing now. @@ -278,32 +279,26 @@ The `Rng` trait does not cover creation of new RNG objects. It is recommended (but not required) that each RNG implement: * `pub fn new() -> Self`, taking a seed from [`OsRng`], see below -* `pub fn new_from_rng(rng: &mut R) -> Self` +* `pub fn from_rng(rng: &mut R) -> Self` * `SeedableRng` for some type `Seed` Note that the above won't be applicable to all `Rng` implementations; e.g. `ReadRng` can only be constructed with a readable object. +`from_rng` is a bit special: in some cases a naive implementation used to seed +a PRNG from another of the same type could effectively make a clone. +Implementations should be careful to prevent this from happening by sampling +extra values and/or mutating the state. It should also be clear that this method +does not add entropy, therefore should not be used for cryptography. + Other constructors should be discouraged; e.g. all current generators have a `new_unseeded` function; realistically no one should use this except certain tests, where `SeedableRng::from_seed(seed) -> Self` could be used instead. -**Question**: is `new_from_rng` a good idea? E.g. naively seeding from another -generator of the same type can unwittingly create a clone. - -**Question**: should be the usual story for creating a new RNG be to use -`OsRng`? This has two advantages: (1) avoids the possibility of issues where -constructing sub-generators compromises the security of generation (or even -leads to multiple generators returning the same values), and (2) allows new -generators to be well-seeded even when the parent has been replaced via -`set_thread_rng` or lost entropy due to entropy-deprevation-attack; however -this also has a disadvantage: it doesn't use `thread_rng` so cannot be made -deterministic for testing purposes. - -**Question**: should `new` return a `Result`? In theory `OsRng` can fail to be -created, but on most platforms creation will always succeed; on the other hand -a panic during usage is a little more likely, but a panic is more difficult to -capture. +Alternatively, `new` could seed from `thread_rng` or similar, or not exist +forcing users to use `from_rng` or the `SeedableRng` trait. However, I believe +`new` should exist and seed from `OsRng`, since this makes the easiest way to +create an RNG secure and well seeded. ## Generators From 31f31960aecd49dd0a92780344f05490867b7692 Mon Sep 17 00:00:00 2001 From: Diggory Hardy Date: Thu, 17 Aug 2017 10:31:09 +0100 Subject: [PATCH 16/18] Rand crate RFC: updates to distributions --- text/0000-rand-crate-redesign.md | 148 ++++++++++++++++++++++--------- 1 file changed, 108 insertions(+), 40 deletions(-) diff --git a/text/0000-rand-crate-redesign.md b/text/0000-rand-crate-redesign.md index d97d2710688..35ff0b8265f 100644 --- a/text/0000-rand-crate-redesign.md +++ b/text/0000-rand-crate-redesign.md @@ -525,19 +525,49 @@ Most of this functionality is contained in the [`distributions`] module. (For a time this module was renamed `dist` for brevity, but renamed back to avoid confusion. `distr` might be another possibility.) -The strawman revision showcases two traits for generating random values of the -current type, the [`Rand`] trait and [`SimpleRand`]. It is the intention to only -keep one of these, and name whichever remains `Rand`. The first, (currently -named) [`Rand`], supports parameterisation by a distribution, thus giving -explicit control over how values are generated. The second, [`SimpleRand`] lacks this -parameterisation, making simple usage simpler but requiring usage of -distributions directly for other cases. - -Both "Rand" traits work in concert with the [`Distribution`] trait; more on that -below. For these examples we'll use two implementations: the "best-for-the-type" -[`Default`] distribution and the [`Range`] distribution. - -Now to some [`Rand`] examples: +### Traits governing generation + +The current `rand` crate has a [`Rand`](https://docs.rs/rand/0.3.16/rand/trait.Rand.html) +trait implementable by any type which can +be "randomly constructed", with quite a few (but not exhaustive) implementations +for various Rust types, one for a standard library type (`Option`), and even +implementations for `Rng` types. + +The strawman revision showcases two variations on `Rand`: a [`Rand`] trait +parameterisable by distributions and a [`SimpleRand`] trait, similar to the +current `Rand`. The intention was to keep only one of these, however my current +preference is to remove both (more on this in a moment). + +Current `rand` has two traits for generating values from distributions, +[`Sample`](https://docs.rs/rand/0.3.16/rand/distributions/trait.Sample.html) and +[`IndependentSample`]. I believe the purpose of `Sample` was to allow +implementation for random processes (e.g. random walks and Lévy flight); +however such processes are usually better interacted with via `advance_state()` +and `get_state()` functions, and are in any case beyond the scope of the `rand` +crate. The strawman revision therefore removes `Sample` and renames +[`IndependentSample`] to [`Distribution`], better reflecting the trait's +purpose. + +The strawman revision adds several new distributions, namely `Uniform`, +`Uniform01`, and `Default` (see below). These distributions allow creation of +random values for arbitrary types, replacing the need for a `Rand` trait. The +[`Rand`] and [`SimpleRand`] traits in the strawman revision are simply +wrappers around distributions, specifically `Default`. + +Additionally, the strawman revision adds a trait called [`Sample`]. This is +simply an extension to [`Rng`], adding some convenience functions to access +functionality from the `Default` and `Range` distributions, as well as +iterators. + +So the strawman revision currently has *three* convenience wrappers around +distributions: [`Rand`], [`SimpleRand`] and [`Sample`]. My feeling is that the +most convenient of these is [`Sample`] and the other two serve no real purpose +(note that implementing random value creation for user-defined types is +now handled via `impl Distribution for Default`). + +#### Examples + +Some [`Rand`] examples: ```rust use rand::distributions::{Rand, Default, Range}; @@ -551,7 +581,7 @@ let byte = u8::rand(&mut rng, Default); let ranged = Rand::rand(&mut rng, Range::new(-99, 100)); ``` -And some [`SimpleRand`] examples: +Some [`SimpleRand`] examples: ```rust use rand::distributions::{SimpleRand, Distribution, Range}; @@ -566,21 +596,36 @@ let byte = u8::simple_rand(&mut rng); let ranged = Range::new(-99, 100).sample(&mut rng); ``` -Note that the `Default` distribution also supports direct sampling, so we don't -need *either* version of `Rand`: +Some [`Sample`] examples: + +```rust +use rand::distributions::{Sample, Rand, Default, Range}; +let mut rng = rand::thread_rng(); + +// Type annotation needed: +let byte: u8 = rng.gen(); + +// For ranges, the generated type is the same as the parameter type: +let ranged = rng.gen_range(-99, 100); +``` + +Equivalent code without using any of the wrappers: ``` -use rand::distributions::{Distribution, Default}; +use rand::distributions::{Distribution, Default, Range}; let mut rng = rand::thread_rng(); let byte: u8 = Default.sample(&mut rng); + +let ranged = Range::new(-99, 100).sample(&mut rng); ``` #### Pass by copy? -Currently [`Rand::rand`] takes the distribution parameter by value. This is the -best option for zero-size distribution types like [`Default`] and [`Open01`], since -it allows call syntax like `Rand::rand(&mut rng, Default)` (second parameter +Currently [`Rand::rand`] and [`Sample::sample`] take the distribution parameter +by value. This is the best option for zero-size distribution types like +[`Default`] and [`Open01`], since it allows call syntax like +`Rand::rand(&mut rng, Default)` (second parameter does not need to be referenced). Most distribution types are fairly small, e.g. `Range` is two or three values @@ -594,15 +639,10 @@ Does this add overhead? Note that currently `rand` is implemented using be required to support `Copy` or, at least, should `sample` take `self` by value? -### Distributions +### Distributions trait The [`Distribution`] trait replaces `rand`'s current [`IndependentSample`] -trait. The `Sample` trait is removed; I believe it was originally intended for use -in random processes like random walks; these are discrete-time (stochastic) -models, thus `advance_state()` and `get_state()` functions are more applicable -than `sample()`; in any case this is beyond the scope of `rand`. - -The surviving trait is quite simple: +trait. It is quite simple: ```rust /// Types (distributions) that can be used to create a random instance of `T`. @@ -618,9 +658,11 @@ This could be extended with other functions such as see a good rationale. Any implementation, such as [`Default`], supports usage via `sample`: -`Default.sample(&mut rng)`. (Note that `struct Default;` is valueless; Rust +`Default.sample(&mut rng)`. (Note that `struct Default;` is a unit type (like `()`); Rust allows objects to be created without any extra syntax: `let x = Default;`.) +#### Simple distributions + Several zero-size structs implementing [`Distribution`] specify simple distributions: * [`Uniform`] specifies uniform distribution over the entire range available, and @@ -631,20 +673,31 @@ Several zero-size structs implementing [`Distribution`] specify simple distribut * [`Open01`] is like [`Uniform01`] but for `(0, 1)` (thus excluding 0.0) * [`Default`] uses [`Uniform`] or [`Uniform01`] depending on type (and can be extended for other types) +* [`AsciiWordChar`] samples uniformly from the ASCII characters 0-9, A-Z and a-z -[`Default`] has roughly the same capabilities as the -[old `Rand`](https://docs.rs/rand/0.3.15/rand/trait.Rand.html); currently it doesn't -support arrays, tuples, `Option`, etc., but it could conceivably, and probably -also `derive(Rand)` or something similar. +[`Default`] has roughly the same capabilities as the the current `rand` crate's +[`Rand`](https://docs.rs/rand/0.3.15/rand/trait.Rand.html); currently it doesn't +support arrays, tuples, `Option`, etc., but support for thes could conceivably +be added, and probably also an equivalent to `derive_rand`. It should be noted that there is no agreement on using the name `Default`. In particular, there is a naming conflict with `std::default::Default`, which can lead to surprising compiler messages if the user forgets to -`use rand::Default;`. Similarly, `Uniform` and `Uniform01` are open to -adjustment. All three could be replaced with a single `Uniform`; this just -leaves two semantic issues: the range differs by type, and some possible +`use rand::Default;`. +Potentially `Default` could be renamed to `Rand`; my personal feeling is that +the name `Default` works well. + +Similarly, `Uniform` and `Uniform01` are open to +adjustment. All three (including `Default`) could be replaced with a single +`Uniform`; but using three names does solve +two semantic issues: (1) the range of sampled values differs by type, especially +between integer and floating-point types, and (2) some possible type-dependent implementations (such as for `Option`) cannot practically have -uniform distribution. +a uniform distribution. + +[`AsciiWordChar`] is currently an oddity, used in many tests but with no hard +requirements on form or function. This could be renamed and augmented in line +with [Regex character classes](https://en.wikipedia.org/wiki/Regular_expression#Character_classes). #### Range @@ -654,15 +707,27 @@ There is one further uniform distribution: integer and floating-point types This [`Range`] is minimally changed from the current `rand`, and supports -extension to user-defined types by exposing its internal fields. An alternative +extension to user-defined types by exposing its internal fields. + +An alternative implementation, [`range2`], has been written in an attempt to improve extension to other types and avoid the need for an unused `zone` field with float types, but has some drawbacks, perhaps most notably that `Range` is parameterised so -that `Range::new(low, high)` must be replaced with `new_range(low, high)` or +that `Range::new(low, high)` must be replaced with `range(low, high)` or `Range::::new(low, high)`. -Possibly the current `range` function should be removed, then `new_range` from -[`range2`] and an equivalent for [`Range`] could be named `range`. +**Question:** which `range` implementation should we choose? + +Unfortunately `range2` exposes more public types like `RangeInt` and +`RangeFloat`, and must retain the `SampleRange` trait but *only* to identify +types for which a `Range` implementation is available. (Suggestions to improve +this code welcome.) + +Note that while `Range` is open to implementation for user-defined types, its +API with a "low" and "high" may not be appropriate for many types; e.g. a +complex type may want "low_real", "high_real", "low_complex" and "high_complex" +parameters. For such uses, it is suggested the user create a new distribution +(e.g. `ComplexRange`) and not try to extend `Range`. #### Non-uniform distributions @@ -984,3 +1049,6 @@ interest in creating random values this way. [`set_thread_rng`]: https://dhardy.github.io/rand/rand/fn.set_thread_rng.html [`set_new_thread_rng`]: https://dhardy.github.io/rand/rand/fn.set_new_thread_rng.html [`OsRng`]: https://dhardy.github.io/rand/rand/struct.OsRng.html +[`Sample`]: https://dhardy.github.io/rand/rand/trait.Sample.html +[`Sample::sample`]: https://dhardy.github.io/rand/rand/trait.Sample.html#tymethod.sample +[`AsciiWordChar`]: https://dhardy.github.io/rand/rand/distributions/struct.AsciiWordChar.html From f9055adff6c51073f693ef124202dbde217806a2 Mon Sep 17 00:00:00 2001 From: Diggory Hardy Date: Fri, 18 Aug 2017 17:59:31 +0100 Subject: [PATCH 17/18] Rand crate rfc: note on Serde --- text/0000-rand-crate-redesign.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/text/0000-rand-crate-redesign.md b/text/0000-rand-crate-redesign.md index 35ff0b8265f..cc73387184d 100644 --- a/text/0000-rand-crate-redesign.md +++ b/text/0000-rand-crate-redesign.md @@ -270,6 +270,14 @@ And a trait could allow entropy injection, however I don't believe this belongs in `rand`. See [suggested trait](https://github.com/rust-lang/rfcs/pull/2106#issuecomment-322414869) and [my thoughts](https://github.com/rust-lang/rfcs/pull/2106#issuecomment-322705482). +Further, serialisation could be allowed by conditionally supporting [Serde]. +This should be an optional feature to avoid depending on Serde where not +needed. Serialisation has been requested +[in this PR](https://github.com/rust-lang-nursery/rand/pull/119). +For now this has not been implemented since [`serde_derive` does not support +`Wrapping`](https://github.com/serde-rs/serde/issues/1020) (and I'm too lazy to +write a workaround). + These traits can be added in the future without breaking compatibility, however they may be worth discussing now. @@ -1052,3 +1060,4 @@ interest in creating random values this way. [`Sample`]: https://dhardy.github.io/rand/rand/trait.Sample.html [`Sample::sample`]: https://dhardy.github.io/rand/rand/trait.Sample.html#tymethod.sample [`AsciiWordChar`]: https://dhardy.github.io/rand/rand/distributions/struct.AsciiWordChar.html +[Serde]: https://serde.rs/ From 8753c810b8e0734a5ffc25df439e0669b9af4ab2 Mon Sep 17 00:00:00 2001 From: Diggory Hardy Date: Tue, 29 Aug 2017 16:05:16 +0100 Subject: [PATCH 18/18] Rand design RFC: almost complete rewrite of Generation API section --- text/0000-rand-crate-redesign.md | 416 ++++++++++++++++++++++--------- 1 file changed, 298 insertions(+), 118 deletions(-) diff --git a/text/0000-rand-crate-redesign.md b/text/0000-rand-crate-redesign.md index cc73387184d..e5760f0e7e8 100644 --- a/text/0000-rand-crate-redesign.md +++ b/text/0000-rand-crate-redesign.md @@ -14,6 +14,7 @@ here. See also: +* [RFC comments](https://github.com/rust-lang/rfcs/pull/2106) * [Crate evaluation thread] * [Strawman revision PR] * [Strawman revision code] @@ -92,6 +93,23 @@ In my opinion, [the extensive examples](https://docs.rs/rand/0.3.16/rand/#exampl in the crate API documentation should be moved to a book or example project. They are well written, but do not really belong in API documentation. +What we should provide is clear guidance (with the usual disclaimers) on choice +of RNG for at least the following cases: + +1. Cryptography should normally be handled by a crypto-library, but where + entropy is needed, users should prefer to use the OS generator directly + where possible. Where a user-space RNG is needed, ChaCha may be the best + choice. +2. Where fast random numbers are needed which should still be cryptographically + secure, but where speed is more important than absolute security (e.g. to + initialise hashes in the std library), a generator like ISAAC+ (?) should be + used; this is exposed via `thread_rng()`. +3. Where a fast and uniform generator is wanted for numeric applications, a + generator like PCG (?) should be preferred. +4. Where determistic operation / reproducibility is desired, any PRNG algorithm + may be used, but the implementation should be selected by specific name and + not a generic name (`StdRng` or `IsaacWordRng`) and be seeded manually. + ## Note on type parameters Very often we make use of type parameters with a restriction like @@ -110,137 +128,294 @@ The `Rng` trait covers *generation* of values. It is not specific to [deterministic] PRNGs; instead there is an extension trait `SeedableRng: Rng` covering deterministic seeding. -It has been proposed to rename `Rng` to `Generator`; this proposal has not -seen a lot of support. +### The `Rng` trait -### `Rng` trait +The [current `Rng` trait](https://docs.rs/rand/0.3.16/rand/trait.Rng.html) is +quite large and complex: -The `Rng` trait governs what types can be generated directly, and there appears -to be consensus on removing all convenience functions not concerned with -value generation. Doing so would leave: +```rust +pub trait Rng { + fn next_u32(&mut self) -> u32; -``` -trait Rng { - fn next_u32(&mut self) -> u32 - fn next_u64(&mut self) -> u64 + fn next_u64(&mut self) -> u64 { ... } + fn next_f32(&mut self) -> f32 { ... } + fn next_f64(&mut self) -> f64 { ... } + fn fill_bytes(&mut self, dest: &mut [u8]) { ... } - fn next_f32(&mut self) -> f32 - fn next_f64(&mut self) -> f64 + fn gen(&mut self) -> T + where Self: Sized { ... } + fn gen_iter<'a, T: Rand>(&'a mut self) -> Generator<'a, T, Self> + where Self: Sized { ... } + fn gen_range(&mut self, low: T, high: T) -> T + where Self: Sized { ... } + fn gen_weighted_bool(&mut self, n: u32) -> bool + where Self: Sized { ... } + fn gen_ascii_chars<'a>(&'a mut self) -> AsciiGenerator<'a, Self> + where Self: Sized { ... } + fn choose<'a, T>(&mut self, values: &'a [T]) -> Option<&'a T> + where Self: Sized { ... } + fn choose_mut<'a, T>(&mut self, values: &'a mut [T]) -> Option<&'a mut T> + where Self: Sized { ... } + fn shuffle(&mut self, values: &mut [T]) + where Self: Sized { ... } +} +``` + +#### Desired properties + +The above trait is **long and complex**, and does not cleanly separate core +functionality (`next_*` / `fill_bytes`) from derived functionality (`gen`, +`choose`, etc.). This design pattern is successfully used elsewhere, e.g. by +[`Iterator`](https://doc.rust-lang.org/std/iter/trait.Iterator.html), but is not +ideal, especially for cryptographic applications where clear, easily-reviewable +code is of huge importance. + +On the other hand, **ease of use** is also important. Simplifying the `Rng` +trait does not, however, imply that usage must be impaired; see the [`Sample`] +trait in the next section for more on this topic. + +**Determinism** is important for many use-cases, including scientific simulations +where it enables third parties to reproduce results, games wishing to reproduce +random creations from a given seed, and cryptography. To quote Joshua +Liebow-Feeser +[@joshlf](https://github.com/rust-lang/rfcs/pull/2106#issuecomment-323546147): + +> CSPRNGs are also used deterministically in cryptographic applications. For +> example, stream ciphers use CSPRNGs to allow multiple parties to all compute +> the same pseudorandom stream based on a secret shared seed. Determinism is +> very important for a CSPRNG even if it isn't used in all applications. + +**Performance** can be quite important for many uses of RNGs. This is not +given top priority, but some applications require *many* random numbers, and +performance is in many cases the main reason to use a user-space RNG instead of +system calls to access OS-provided randomness. The design therefore considers +performance an important goal for most functionality, although system calls to +access OS randomness are assumed to be relatively slow regardless. + +Algorithmic random number generators tend to be infallible, but external +sources of random numbers may not be, for example some operating-system +generators will fail early in the boot process, and hardware generators can +fail (e.g. if the hardware device is removed). These **failures** can only be +handled via `panic` with the current `Rng` trait, but an interface exposing this +possibility of error may be desirable (on the other hand, wrapping all return +values with `Result` is undesirable both for performance and style reasons). + +#### Proposed `Rng` trait + +``` +// (Ignore CryptoRng base trait for now; see next section.) +trait Rng: CryptoRng { + fn next_u32(&mut self) -> u32; + fn next_u64(&mut self) -> u64; + #[cfg(feature = "i128_support")] + fn next_u128(&mut self) -> u128; - fn fill_bytes(&mut self, dest: &mut [u8]) + fn fill_bytes(&mut self, dest: &mut [u8]); } ``` -Further, although [direct generation of floating-point random values is -possible](http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/#dSFMT), it is -proposed to remove `next_f32` and `next_f64`. This simplifies the API and removes -non-trivial integer to float conversions from the trait, but is not truely -necessary. - -It has been suggested that `next_u8`, `next_u16` and (where supported) -`next_u128` functions should be added. However, provided that default -implementations are included, these can be added to the trait in the future if -desired. It seems unlikely that PRNGs would offer faster generation of `u8` and -`u16` values than simply casting (`next_u32() as u8`), and the only curent -alternative, OS-based-generation, has high overhead; therefore there appears -little use for `next_u8` and `next_u16` functions. `next_u128` on the other -hand may have some utility: [native 128-bit generators already +All derived (convenience) methods have been removed from this trait; this +functionality is moved to the [`Sample`] trait or elsewhere. + +Further, the `next_f32` and `next_f64` functions have been removed. +Although [direct generation of floating-point random values is +possible](http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/#dSFMT), I do +not believe this is often done in practice (such a generator would not be able +to directly produce random values of integer types efficiently, and would likely +have little performance advantage). + +Finally, `next_u128` has been added. [Native 128-bit generators already exist](http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/SFMT/). This may provide -some performance benefit to applications requiring many pairs of 64-bit values, -but I suggest not adding `next_u128` for now unless specifically requested. +some performance benefit to applications requiring many pairs of 64-bit values. + +It has been suggested that `next_u8`, `next_u16` also be added. However, most +fast generators operate on 32-bit or 64-bit integers natively, and attempts to +extract less bits usually only impair performance. This RFC therefore does not +add these methods, but notes that they could be added in the future without any +breakage (using default implementations on `next_u32`). + +In order to assure cross-platform reproducibility of results from deterministic +generators, it is important that *endianness* is specified when converting between +numbers of different sizes, especially to/from `u8` types. This RFC specifies +that all conversions should be *little endian*, including `u32` → `u64` (least +significant first) and `u64` → `u32` (use least significant part). + +Default implementations of most of these functions in terms of `next_u32` and +`next_u64` could be added; in my personal opinion there should be no default +methods since it is not uncommon (at least within the `rand` crate) to write +wrapper types around a generator re-implementing the `Rng` trait; if default +implementations exist then a wrapper implementation may miss some functions +and "work", but behave incorrectly (different random number stream, worse +performance). Instead, example code can be shown for `next_*` impls (e.g. +`self.next_u64() as u32` and +`(self.next_u32() as u64) | ((self.next_u32() as u64) << 32)`), and a function +to implement `fill_bytes`: +`fn fill_bytes_from_u64(rng: &mut R, dest: &mut [u8])`. + +#### Proposed `CryptoRng` trait + +For cryptographic purposes, it has been suggested that `fill_bytes` alone would be +sufficient. Further, the function should return a `Result`, to properly +accomodate external randomness sources which can fail, such as direct OS system +calls or file reads. + +The following trait is proposed, intended to be as simple as possible. It is +likely that details of any failure will not be useful, hence the empty +`CryptoError` struct (arguments to the contrary welcome). + +```rust +use std::result::Result; + +pub struct CryptoError; +pub trait CryptoRng { + fn try_fill(&mut self, dest: &mut [u8]) -> Result<(), CryptoError>; +} ``` -trait Rng { - fn next_u32(&mut self) -> u32 - fn next_u64(&mut self) -> u64 - // and possibly: - fn next_u128(&mut self) -> u128 - - fn fill_bytes(&mut self, dest: &mut [u8]) + +Note: methods like `fn try_next_u32(&mut self) -> Result` can +be added if desired, but probably it's preferable to keep this trait minimal. +Function names must differ from those in the `Rng` trait (reason a little later). + +Since the trait `Rng` extends `CryptoRng`, we implicitly implement `CryptoRng` +for every `Rng` with the following code (which means that `Rng` implementations +do not have to worry about `CryptoRng`, and every `Rng` implementation is +automatically available as a `CryptoRng` implementation too): + +```rust +impl CryptoRng for R { + fn try_fill(&mut self, dest: &mut [u8]) -> Result<(), CryptoError> { + Ok(self.fill_bytes(dest)) + } } ``` -For crypto purposes, it has been suggested that `fill_bytes` alone would be -sufficient. For non-crypto purposes, at least `next_u32` and `next_u64` -are desirable for performance, since many RNGs natively -produce these values. - -Further to this, there is discussion on whether these methods should all return -a `Result`. Apparently, some crypto RNGs can estimate available entropy and -detect cycles. A properly seeded cryptographic generator should be able to -produce a very long sequence of strong cryptographic numbers, but without -sufficient entropy for initialisation the generated numbers could be guessable. -Further, on many platforms, `OsRng` could fail (presumably due to the same -initialisation problem); the current implementation can panic. -Some people therefore advocate changing the above -functions to each return a `Result`. - -From the point of view of non-cryptographic numeric random number generation, -RNGs are usually fast, deterministic functions which have no means to detect -errors. Some may be able to detect lack of initialisation, but some implementations -always initialise with a fixed seed if no custom seed is provided. PRNGs could -cycle, but usually have a very long period and no practical way of detecting -cycles (note that returning the same value twice does not imply a cycle, and -that a cycle may return to a state later than the initial state). There is -therefore little use in non-crypto PRNGs returning a `Result`; doing so -would also require extra error handling in user code or `unwrap` within the -library, as well as some performance overhead. All distributions would need to -be adapted. - -There is therefore a conflict of design here; [Brian Smith suggests separate -crypto and non-crypto APIs](https://internals.rust-lang.org/t/crate-evaluation-for-2017-07-25-rand/5505/59?u=dhardy) -(and presumably crates). This would allow a minimal crypto trait with a single -`fill_bytes(..) -> Result<..>` method, without impacting performance or -correctness (`unwrap`) of non-crypto code, while PRNGs have simple methods like -`next_u32(&mut self) -> u32`. - -In support of this split, there is a strong argument that cryptographic -applications should [relying on OS -generators](https://internals.rust-lang.org/t/crate-evaluation-for-2017-07-25-rand/5505/37) -where possible. Further, keeping cryptography-related crates small may be -useful for security reviews. - -**Question:** -If `rand`/`Rng` is split into multiple parts, several approaches are possible; -what to do here is very much undecided. - -First, "strong RNGs" designed for cryptographic usage could: - -1. Use a dedicated `CryptoRng` trait, indepedant of `Rng`, with methods - returning a `Result` on failure -2. As (1), but add a wrapper struct implementing `Rng` for a `CryptoRng` which - panics on errors -3. Use the standard `Rng` trait and be designed such that the only time - failure is possible is during creation (`fn new() -> Result`) -4. Do not include any generators which may fail in `rand` (leave to dedicated - crypto libraries, if at all) [practically, this is equivalent to 3 aside - from choice of generators to include] - -Second, the `rand` crate could: - -1. Remain a single crate -2. Have a sub-crate `os_rand` covering only the `OsRng` functionality (via - simple functions), and have `rand` depend on `os_rand` for initialisation -3. Have a `rand_core` sub-crate including `OsRng` functionality, the `Rng` - and/or `CryptoRng` trait(s), and a crypto-approved PRNG; `rand` would - depend on `rand_core` -4. Keep all generators in `rand` and move all distributions (including - `random()`) to a `distributions` crate (or same split but with `rng` and - `rand` crate names) -5. Split into even more crates... - -*Personally,* I feel that we should stick to a single `Rng` trait as above. I -am not against splitting the implementation into multiple crates, but see -little benefit (a split *might* facilitate review of cryptographic code). - -### Debug - -The strawman revision now specifies `pub trait Rng: Debug {...}`, i.e. requires -all implementations to implement `Debug`. In many cases this requires only -that implementing types are prefixed with `#[derive(Debug)]`, although in some -cases it adds requirements on internal values. - -Is this useful? +Since users may wish to use `CryptoRng` generators in code expecting an `Rng` +trait implementation, we also provide two wrapper functions to create an `Rng` +from a `CryptoRng`: + +```rust +/// Consume any CryptoRng, returning a type implementing `Rng`: +fn as_rng(rng: CR) -> AsRng; +/// Given any `&mut crng` where `crng: CryptoRng`, return a type implementing `Rng`: +fn as_rng_ref<'a, CR: 'a+CryptoRng+?Sized>(rng: &'a mut CR) -> AsRngRef<'a, CR>; +``` + +(Hopefully there will be little need to use the above `as_rng*` functions in practice. +There are three potential issues: (1) cryptographic generators may not be +uniform, (2) cryptographic generators may be relatively slow, (3) there is an +unlikely possibility of a panic. Because of this, code expecting to be used with +a `CryptoRng`, such as an RNG constructor seeding from another RNG, should use +randomness from a `CryptoRng` not an `Rng`.) + +#### Alternative `Rng` / `CryptoRng` designs + +Several alternative designs for `Rng` and `CryptoRng` are possible. + +We could have a single trait. In such a case, there is a strong desire to keep +`next_u32` and `next_u64` methods returning an integer directly: any `Result` +would for practical reasons probably get `unwrap()`-ed since e.g. propegating +`Result` errors through all random-distribution code when this code would +normally be used with infallible generators anyway makes no sense. `fill_bytes` +could be modified to return a `Result`, but the result would be a hybrid monster +where fallible generators would have to implement `next_u32` etc. methods with +`unwrap` / `panic`. Further, there has been a call to have a cryptographic +generator trait with minimal complexity, as [for example in the `ring` crate](https://briansmith.org/rustdoc/ring/rand/trait.SecureRandom.html). + +Using two traits, `Rng` and `CryptoRng`, they could have a different +relationship. We could have `CryptoRng: Rng` but in practice this makes no +sense (any `CryptoRng` must implement all `Rng` functions using `unwrap`, and +this doesn't meet the desire for a minimal-complexity `CryptoRng`). We could +use two unrelated traits (no trait extension), but this doesn't seem to offer +any advantages. + +We could even use a [base `RawRng` trait](https://github.com/dhardy/rand_design/blob/master/traits/raw_rng.rs) where `E` is either +`CryptoError` or `!` and all functions return `Result`; this has been +demonstrated [not to reduce performance](https://github.com/rust-lang/rfcs/pull/2106#issuecomment-323769203) and adds a "super trait", but since +`RawRng` is not object-safe I see little advantage. There is a long discussion +around this design [starting here](https://github.com/rust-lang/rfcs/pull/2106#issuecomment-323388931). + +For an overview of various two-trait solutions, [see this repository](https://github.com/dhardy/rand_design/tree/master/traits). +[Note: these use functions `next_u32` and `try_next_u32`; since we don't need +`try_next_u32` imagine the examples use `fill_bytes` and `try_fill` instead.] + +One point of note is [@Lokathor's suggestion](https://github.com/rust-lang/rfcs/pull/2106#issuecomment-325604852) +to use type-safety to prevent insecure generators being used where a secure +generator is required. *Personally* I do not see `CryptoRng` as a +*promise of secure random numbers*, but as a way of handling +potentially-fallible generators; for example `OsRng` and the `RDRAND` +instruction are potentially fallible but whether they are secure is another +question (OS design issues and limitations of embedded systems may make +`OsRng` less secure and most OSes have decided not to trust `RDRAND` fully but +instead use it only as an extra source of entropy). Of course, there is still +something to the type-safety idea: it *might* prevent use of very insecure +generators in some situations where security matters. + +Of all the above, the most reasonable three options are in my opinion: + +1. Use the design above where `Rng` extends `CryptoRng` +2. Use two unrelated traits, requiring explicit wrapping to convert objects of + either type to the other ([like this](https://github.com/dhardy/rand_design/blob/master/traits/separate_explicit_Rng.rs)) +3. Use a single trait and accept that fallible generators may panic + +#### Bikeshedding names + +The names `Rng` and `CryptoRng` could be adjusted. Both types could uses the +same name in different namespaces. But, in my personal opinion, the guidance in +[RFC #356](https://github.com/rust-lang/rfcs/pull/356) should be taken with a +pinch of salt: names of items commonly used or discussed together should be +distinct to avoid the need for path prefixes. Brian Smith advocates using the +name `Generator` since the "R" in "RNG" is redundant with the crate name and +the "N" may be inappropriate (not all results are numbers), however there are a +couple of reasons to use RNG even so: (1) *RNG* is a well known and easy term to +search for, (2) "generator" is long, especially for use in suffixes like `OsRng` +or `OsGenerator`, (3) *generator* is highly inspecific. + +The functions `as_rng` and `as_rng_ref`, and their return types, could be +renamed. (Note that two separate functions and return types are needed for +technical reasons unless an `impl Rng for &mut CryptoRng` rule is added; +compare [extends_CryptoRng2](https://github.com/dhardy/rand_design/blob/master/traits/extends_CryptoRng2.rs) and [extends_CryptoRng3](https://github.com/dhardy/rand_design/blob/master/traits/extends_CryptoRng3.rs).) + +The function names `fill_bytes` and `try_fill` are inconsistent; we could instead +use `try_fill_bytes`. + +### Rand crates + +Currently, `rand` exists as a single crate. There has been a call to expose +`CryptoRng` and `OsRng` in separate crates. It has also been suggested that +no PRNGs remain in `rand` and separate crates be used for things like +`thread_rng`. Following this, the following is suggested: + +* `rand_crypto` contains `CryptoRng` and `CryptoError` but nothing else and + has no dependencies on other crates +* `rand_os` contains `OsRng` implemented via `CryptoRng`, depends on + `rand_cypto` but nothing else +* `rand` depends on `rand_crypto` and provides `Rng`, adaptors, distributions, + etc. +* `rand_chacha`, `rand_isaac`, `rand_xorshift`, `rand_pcg` etc. contain + families of generators +* `rand_thread` provides `thread_rng` via one of the above crates + +In this case, crates implementing an RNG would depend on `rand_crypto` or +`rand`. Numeric users would depend on `rand` and `rand_thread` or some other +RNG crate. + +(TODO: should RNG crates depend on `rand_os` for secure initialisation, or not +handle this internally? If not, should users import both crates and do something +like `MyRng::from(OsRng::new())`, or maybe something like +`OsRng::new_seeded::()`?) + +#### Alternatives + +**Names:** +we could break with tradition and name crates with `rng` instead of `rand`. +We could use different crate name patterns/prefixes for cryptographic RNGs vs +numeric RNGs, or for RNG implementations vs traits & derived functionality. +(Note: owner of `rng` crate has offered its use.) + +**Crates:** +we could use fewer crates, or even more crates (e.g. by putting the `Rng` trait +and its impl rules in a separate crate). ### Extension traits @@ -278,6 +453,9 @@ For now this has not been implemented since [`serde_derive` does not support `Wrapping`](https://github.com/serde-rs/serde/issues/1020) (and I'm too lazy to write a workaround). +Alternatively, serialisation could be enabled via a simple custom trait instead +of depending on serde. + These traits can be added in the future without breaking compatibility, however they may be worth discussing now. @@ -306,7 +484,9 @@ tests, where `SeedableRng::from_seed(seed) -> Self` could be used instead. Alternatively, `new` could seed from `thread_rng` or similar, or not exist forcing users to use `from_rng` or the `SeedableRng` trait. However, I believe `new` should exist and seed from `OsRng`, since this makes the easiest way to -create an RNG secure and well seeded. +create an RNG secure and well seeded. *However*, if `OsRng` is in a separate +`rand_os` crate, should RNG crates depend on this just to provide a `new` +function? See the *Rand crates* sub-section above. ## Generators