From 272de1ff273dc5723ab6d8b849603a78f22e9e78 Mon Sep 17 00:00:00 2001 From: Ludwig Stecher Date: Fri, 14 Apr 2023 23:15:44 +0200 Subject: [PATCH 1/7] add lossy conversions RFC --- 0000-lossy-conversions.md | 319 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 319 insertions(+) create mode 100644 0000-lossy-conversions.md diff --git a/0000-lossy-conversions.md b/0000-lossy-conversions.md new file mode 100644 index 00000000000..76844e552be --- /dev/null +++ b/0000-lossy-conversions.md @@ -0,0 +1,319 @@ +- Feature Name: `lossy_conversions` +- Start Date: (fill me in with today's date, YYYY-MM-DD) +- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) +- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) + +# Summary + +[summary]: #summary + +Add traits for lossy numeric conversions as an alternative to the `as` operator, and deprecate `as` for lossy numeric casts in a future edition, so + +```rust +let n = f64::PI as usize; +``` + +becomes + +```rust +let n: usize = f64::PI.lossy_into(); +``` + +# Motivation + +[motivation]: #motivation + +The `as` operator is a footgun when used to convert between number types. For example, converting an `i64` to an `i32` may silently truncate digits. Other conversions may wrap around, saturate, or lose numerical precision. + +The problem is that this is not obvious when reading the code; the `as` operator looks innocuous, so the risk of getting a wrong result is easily overlooked. This goes against Rust's design philosophy of highlighting potential problems with explicit syntax. + +# Guide-level explanation + +[guide-level-explanation]: #guide-level-explanation + +Conversions between number types can sometimes be lossy. For example, when converting a `f64` to `f32`, the number gets less precise, and very large numbers become _Infinity_. Rust offers the following traits for converting between numbers: + +- `From`/`Into`: These cover _lossless_ conversions, where the output is the exact same number as the input. For example, converting a `u8` to a `u32` is lossless, because every value of `u8` can be represented by a `u32`. You may also use the `as` operator for lossness numeric conversions. + +- `TryFrom`/`TryInto`: These cover _fallible_ conversions, which return an error when the input can't be represented by the output type. + +- `TruncatingFrom`/`TruncatingInto`: These traits are used for lossy integer conversions. When using these traits, leading bits that don't fit into the output type are cut off; the remaining bits are reinterpreted as the output type. This can change the value completely, even turn a negative number into a positive number or vice versa. This conversion is very fast, but should be used with care. + +- `SaturatingFrom`/`SaturatingInto`: Like the `Saturating*` traits, these traits are used for lossy integer conversions. They check if the input fits into the output type, and if not, the closest possible value is used instead. For example, converting `258` to a `u8` with this strategy results in `255`, which is the highest `u8`. + +- `LossyFrom`/`LossyInto`: These traits cover conversions involving floats (`f32` and `f64`). The converted value may be both rounded and saturated. When converting from a float to an integer, `NaN` is converted to `0`. + +Although the `as` operator can also be used for _truncating_ and _lossy_ numeric conversions, this is discouraged and will be deprecated in the future. The `cast_lossy` lint warns against this, and will become an error in a future edition. + +## Examples + +```rust +42_u8 as i16 == 42 // for lossless conversions, `as` is ok + +i16::from(42_u8) == 42 + +i8::try_from( 42_u8) == Ok(42) +i8::try_from(-24_u8).is_err() + +i8::truncating_from( 42_u8) == 42 +u8::truncating_from(-24_i8) == 232 // u8::MAX + 1 - 14 +i8::truncating_from(232_u8) == -24 // u8::MAX + 1 - 232 +u8::truncating_from(280_i16) == 24 // 280 % u8::MAX = 24 + +u8::saturating_from( 42_i8) == 42 +u8::saturating_from(-14_i8) == 0 // u8::MIN +i8::saturating_from(169_u8) == 127 // i8::MAX +u8::saturating_from(280_i16) == 255 // u8::MAX + +f32::lossy_from(42_i32) == 42.0 +f32::lossy_from(1073741827_i32) == 1073741800.0 // rounded +i32::lossy_from(f32::PI) == 3 // rounded +i32::lossy_from(f32::INFINITY) == i32::MAX // saturated +``` + +## How does this impact writing code? + +This way of doing conversions is more verbose than in other languages. However, it is also very flexible, since you can choose _how_ a value should be converted. And since the behavior is explicit, you can't choose a truncating conversion instead of a lossless one by accident. Method names such as `truncating_from` alert the reader to the possibility of a bug. + +The `as` operator can be used instead of `truncating_from` or `lossy_from`. However, this is discouraged, and will become a warning and then an error in the future. `as` does not guard against logic bugs, and may even encourage sloppy code. That's why it should no longer be used for conversions between numbers. + +# Reference-level explanation + +[reference-level-explanation]: #reference-level-explanation + +The `as` operator has more than one purpose. Besides numeric casts, it is also used for + +- enum to discriminant casts +- casts involving raw pointers, addresses, and function items +- [type coercions](https://doc.rust-lang.org/stable/reference/type-coercions.html) (e.g. from `&mut T` to `&T`, or from `[T; N]` to `[T]`) + +These are _not_ affected by this RFC; this proposal only concerns itself with casts between numbers. + +To be able to deprecate `as` for lossy numeric casts, any numeric conversion must be achievable by other means. The most promising solution for this is to use traits with the same design as `From`/`Into`. + +To make potential errors explicit, we can distinguish between these numeric errors: + +1. **Truncation**: Digits from the beginning of the number are cut off +2. **Wrapping**: The bits of a signed integer are reinterpreted as an unsigned integer, or vice versa +3. **Saturation**: If the number is too high or too low, the closest possible number is selected +4. **Precision loss**: The number is rounded, resulting in fewer significant digits + +Truncation and Wrapping often occur together; for example, an `i32 → u16` conversion can both truncate and wrap around. To keep the complexity to a minimum, we treat wrapping as a special case of truncation, so we arrive at the following 6 new traits: + +- `Truncating{From,Into}` — truncating conversions between integers +- `Saturating{From,Into}` — saturating conversions between integers +- `Lossy{From,Into}` — lossy conversions that involve floats + +Note that the word "lossy" means any conversion that doesn't preserve the input value (including includes truncation and saturation), but the `Lossy*` traits have a narrower scope. + +`TruncatingFrom` and `LossyFrom` can be implemented in the standard library using `as` by silencing the lint. For example: + +```rust +#![allow(cast_lossy)] + +impl TruncatingFrom for i8 { + fn truncating_from(value: i16) -> i8 { + value as i8 + } +} + +impl LossyFrom for f32 { + fn lossy_from(value: f64) -> f32 { + value as f32 + } +} +``` + +`SaturatingFrom` must be implemented manually, but is straightforward: + +```rust +#![allow(cast_lossy)] + +impl SaturatingFrom for i8 { + fn saturating_from(value: u8) -> i8 { + if value < i8::MIN { + i8::MIN + } else if value > i8::MAX { + i8::MAX + } else { + value as i8 + } + } +} +``` + +The `*Into` traits are implemented with blanket implementations: + +```rust +impl TruncatingInto for T +where + U: TruncatingFrom, +{ + fn truncating_into(self) -> U { + U::truncating_from(self) + } +} + +impl SaturatingInto for T +where + U: SaturatingFrom, +{ + fn saturating_into(self) -> U { + U::saturating_from(self) + } +} + +impl LossyInto for T +where + U: LossyFrom, +{ + fn lossy_into(self) -> U { + U::lossy_from(self) + } +} +``` + +The traits will be added to the standard library prelude in a future edition. + +This list of conversions should be implemented: + +- `Truncating*` and `Saturating*`: + - all **signed** to **unsigned** integers + - all **signed** to **smaller signed** integers (e.g. `i16 → i8`) + - all **unsigned** to **smaller or equal-sized** integers (e.g. `u32 → u16` or `u32 → i32`) + - specifically, for `isize` and `usize` (we assume they have 16 to 128 bits): + - `u32`, `u64`, `u128`, `i8`, `i16`, `i32`, `i64`, `i128` into `usize` + - `u16`, `u32`, `u64`, `u128`, `i32`, `i64`, `i128` into `isize` + - `isize` into `usize`, `u8`, `u16`, `u32`, `u64`, `u128`, `i8`, `i16`, `i32`, `i64` + - `usize` into `isize`, `u8`, `u16`, `u32`, `u64`, `i8`, `i16`, `i32`, `i64`, `i128` +- `Lossy*`: + - `f64` into `f32` + - `f64`/`f32` into any integer + - any integer with more than 32 bits (including `isize`/`usize`) into `f64` + - any integer with more than 16 bits (including `isize`/`usize`) into `f32` + +## The lint + +A `cast_lossy` lint is added to rustc that lints against using the `as` operator for lossy conversions. + +This lint is allow-by-default, and can be enabled with `#[warn(cast_lossy)]`. The lint is later enabled as a warning, either after a certain time has passed, or at an edition boundary. Eventually, it will become an error at an edition boundary. + +# Drawbacks + +[drawbacks]: #drawbacks + +1. The API surface of this change is pretty big: It has 3 new traits and over 200 impls. These make the documentation for primitive types less clear. + +2. The traits make the language more complex, as there is one more thing to learn. + +3. When the traits are added to the default prelude, more things are implicitly in scope. + +4. This may change the overall character of the language. However, I believe it would make the language feel more consistent, since Rust already leans towards explicitness in most other situations. + +5. This may negatively impact compile times _(to be verified)_. + +# Rationale and alternatives + +[rationale-and-alternatives]: #rationale-and-alternatives + +1. The `Saturating*` traits aren't needed to deprecate `as` for lossy numeric conversions, so we could add only the `Truncating*` and `Lossy*` traits. + + However, the standard library already contains saturating math operations, so adding saturating conversions makes sense. + +2. We could require to import the traits explicitly, instead of putting them in the standard library prelude. + + However, I believe that not having the traits in scope by default would make the feature much less ergonomic. `TryFrom`/`TryInto` were added to the standard library prelude for the same reason. + + **NOTE**: [Future possibilites: Inherent methods][inherent-methods] describes a solution that doesn't require changing the prelude. + +3. Instead of deprecating `as` only for lossy numeric casts, it could be deprecated for all numeric casts, so `From`/`Into` is required in these situations. + + This feels like overkill. If people really want to forbid `as` for lossless conversions, they can use clippy's `cast_lossless` lint. + +4. Instead of adding traits, the conversions could be added as inherent methods. + + However, then the output type must be part of the name, so there would be `i32::saturating_into_i16()`, `i32::saturating_into_i8()`, and so on. I prefer the comparatively shorter `i32::saturating_into()`. + +5. The `Lossy*` traits could have a more descriptive name, since the term "lossy" seems to include truncation and saturation. The only name I could find that kind of describes the behavior of `LossyFrom` is `Approximate` + +6. The traits could be implemented in an external crate, but then the traits couldn't be added to the standard library prelude. Furthermore, to deprecate `as` for numeric conversions, the APIs to replace it should be available in the standard library, so they can be recommended in compiler warnings/errors. + +7. Of course we could do nothing about this. Rust's increasing popularity means that this change would impact millions of developers, so we should be sure that the benefits justify the churn. This feature isn't _required_; Rust has worked well until now without it, and Rustaceans have learned to be extra careful when using `as` for numeric conversions. + + However, I am convinced that removing this papercut will make Rust safer and prevent more bugs. + +# Prior art + +[prior-art]: #prior-art + +This proposal was previously discussed in [this internals thread](https://internals.rust-lang.org/t/lets-deprecate-as-for-lossy-numeric-casts/16283). + +For the proposed lint, there exists prior art in clippy: + +- `cast_possible_truncation` +- `cast_possible_wrap` +- `cast_precision_loss` +- `cast_sign_loss` + +These lints show that lossy numeric casts can pose enough of a problem to forbid them, even though there is currently no alternative. Another data point is , which received + +# Unresolved questions + +[unresolved-questions]: #unresolved-questions + +- Are there better trait and method names? +- Does this impact compile times? + +# Future possibilities + +[future-possibilities]: #future-possibilities + +## Inherent methods + +[inherent-methods]: #inherent-methods + +Inherent methods similar to [`str::parse`](https://doc.rust-lang.org/std/primitive.str.html#method.parse) can be added to make usage more ergonomic, e.g. + +```rust +impl i32 { + pub fn truncate>(self) -> T { + T::truncating_from(self) + } + + pub fn saturate>(self) -> T { + T::saturating_from(self) + } + + pub fn approx>(self) -> T { + T::lossy_from(self) + } +} +``` + +Usage: + +```rust +value.truncate::() +// instead of +u8::truncating_from(value) +``` + +Benefits are: + +- it is shorter +- unlike `value.truncating_into()` it allows specifying the output type +- unlike `T::truncating_from(value)`, it is chainable +- it doesn't require an import, so the proposed traits don't need to be added to the standard library prelude + +## NonZero types + +Conversions could also be implemented for `NonZero{U,I}{8,16,32,64,128}`. + +## Pattern types + +If [pattern types](https://github.com/rust-lang/rust/pull/107606) (e.g. `u32 is 1..`) are added, the compiler can often verify when an `as` cast is lossless: + +```rust +let x: u32 is 0..=1000 = 42; +let y = x as i32; // no warning; the cast is lossless +``` From 0426a2a9ce84c165ae0918c3af04ad4a0ad79dc2 Mon Sep 17 00:00:00 2001 From: Ludwig Stecher Date: Fri, 14 Apr 2023 23:43:24 +0200 Subject: [PATCH 2/7] typos, add question --- 0000-lossy-conversions.md | 25 ++++++++++++++++++++----- 1 file changed, 20 insertions(+), 5 deletions(-) diff --git a/0000-lossy-conversions.md b/0000-lossy-conversions.md index 76844e552be..5cf9e7a9ae1 100644 --- a/0000-lossy-conversions.md +++ b/0000-lossy-conversions.md @@ -39,7 +39,7 @@ Conversions between number types can sometimes be lossy. For example, when conve - `TruncatingFrom`/`TruncatingInto`: These traits are used for lossy integer conversions. When using these traits, leading bits that don't fit into the output type are cut off; the remaining bits are reinterpreted as the output type. This can change the value completely, even turn a negative number into a positive number or vice versa. This conversion is very fast, but should be used with care. -- `SaturatingFrom`/`SaturatingInto`: Like the `Saturating*` traits, these traits are used for lossy integer conversions. They check if the input fits into the output type, and if not, the closest possible value is used instead. For example, converting `258` to a `u8` with this strategy results in `255`, which is the highest `u8`. +- `SaturatingFrom`/`SaturatingInto`: Like the `Truncating*` traits, these traits are used for lossy integer conversions. They check if the input fits into the output type, and if not, the closest possible value is used instead. For example, converting `258` to a `u8` with this strategy results in `255`, which is the highest `u8`. - `LossyFrom`/`LossyInto`: These traits cover conversions involving floats (`f32` and `f64`). The converted value may be both rounded and saturated. When converting from a float to an integer, `NaN` is converted to `0`. @@ -56,7 +56,7 @@ i8::try_from( 42_u8) == Ok(42) i8::try_from(-24_u8).is_err() i8::truncating_from( 42_u8) == 42 -u8::truncating_from(-24_i8) == 232 // u8::MAX + 1 - 14 +u8::truncating_from(-24_i8) == 232 // u8::MAX + 1 - 24 i8::truncating_from(232_u8) == -24 // u8::MAX + 1 - 232 u8::truncating_from(280_i16) == 24 // 280 % u8::MAX = 24 @@ -104,7 +104,21 @@ Truncation and Wrapping often occur together; for example, an `i32 → u16` conv - `Saturating{From,Into}` — saturating conversions between integers - `Lossy{From,Into}` — lossy conversions that involve floats -Note that the word "lossy" means any conversion that doesn't preserve the input value (including includes truncation and saturation), but the `Lossy*` traits have a narrower scope. +Note that the word "lossy" means any conversion that doesn't preserve the input value (including truncation and saturation), but the `Lossy*` traits have a narrower scope. + +```rust +pub trait TruncatingFrom { + fn truncating_from(value: T) -> Self; +} + +pub trait SaturatingFrom { + fn saturating_from(value: T) -> Self; +} + +pub trait LossyFrom { + fn lossy_from(value: T) -> Self; +} +``` `TruncatingFrom` and `LossyFrom` can be implemented in the standard library using `as` by silencing the lint. For example: @@ -131,9 +145,9 @@ impl LossyFrom for f32 { impl SaturatingFrom for i8 { fn saturating_from(value: u8) -> i8 { - if value < i8::MIN { + if value < i8::MIN as i16 { i8::MIN - } else if value > i8::MAX { + } else if value > i8::MAX as i16 { i8::MAX } else { value as i8 @@ -263,6 +277,7 @@ These lints show that lossy numeric casts can pose enough of a problem to forbid - Are there better trait and method names? - Does this impact compile times? +- Should the traits remain perma-unstable, so they can be used, but not implemented outside of the standard library? # Future possibilities From 02f8df9c97fd9d0eb83251b8f017d161057ce94a Mon Sep 17 00:00:00 2001 From: Ludwig Stecher Date: Sun, 16 Apr 2023 00:12:20 +0200 Subject: [PATCH 3/7] Suggest inherent methods, don't expose traits, remove *From traits, more explanations --- 0000-lossy-conversions.md | 200 ++++++++++++++++---------------------- 1 file changed, 83 insertions(+), 117 deletions(-) diff --git a/0000-lossy-conversions.md b/0000-lossy-conversions.md index 5cf9e7a9ae1..f95e658fea0 100644 --- a/0000-lossy-conversions.md +++ b/0000-lossy-conversions.md @@ -16,7 +16,9 @@ let n = f64::PI as usize; becomes ```rust -let n: usize = f64::PI.lossy_into(); +let n: usize = f64::PI.approx(); +// or +let n = f64::PI.approx::(); ``` # Motivation @@ -25,25 +27,33 @@ let n: usize = f64::PI.lossy_into(); The `as` operator is a footgun when used to convert between number types. For example, converting an `i64` to an `i32` may silently truncate digits. Other conversions may wrap around, saturate, or lose numerical precision. -The problem is that this is not obvious when reading the code; the `as` operator looks innocuous, so the risk of getting a wrong result is easily overlooked. This goes against Rust's design philosophy of highlighting potential problems with explicit syntax. +The problem is that this is not obvious when reading the code; the `as` operator looks innocuous, so the risk of getting a wrong result is easily overlooked. This goes against Rust's design philosophy of highlighting potential problems with explicit syntax. This is similar to the `unsafe` keyword, which makes unsafe Rust more verbose, but also highlights code that could cause Undefined Behaviour. `as` can not introduce UB, but it can be a logic error. Rust also tries to prevent logic errors, e.g. by requiring that a `match` covers all possible values, and errors aren't silently ignored. # Guide-level explanation [guide-level-explanation]: #guide-level-explanation -Conversions between number types can sometimes be lossy. For example, when converting a `f64` to `f32`, the number gets less precise, and very large numbers become _Infinity_. Rust offers the following traits for converting between numbers: +Conversions between number types can sometimes be lossy. For example, when converting a `f64` to `f32`, the number gets less precise, and very large numbers become _Infinity_. Rust offers the following methods for converting between numbers: -- `From`/`Into`: These cover _lossless_ conversions, where the output is the exact same number as the input. For example, converting a `u8` to a `u32` is lossless, because every value of `u8` can be represented by a `u32`. You may also use the `as` operator for lossness numeric conversions. +- `into()`: This covers _lossless_ conversions, where the output is the exact same number as the input. For example, converting a `u8` to a `u32` is lossless, because every value of `u8` can be represented by a `u32`. You may also use the `as` operator for lossness numeric conversions. -- `TryFrom`/`TryInto`: These cover _fallible_ conversions, which return an error when the input can't be represented by the output type. +- `try_into()`: This covers _fallible_ conversions, which return an error when the input can't be represented by the output type. -- `TruncatingFrom`/`TruncatingInto`: These traits are used for lossy integer conversions. When using these traits, leading bits that don't fit into the output type are cut off; the remaining bits are reinterpreted as the output type. This can change the value completely, even turn a negative number into a positive number or vice versa. This conversion is very fast, but should be used with care. +- `truncate()`: Used for **lossy integer conversions**. This truncates leading bits that don't fit into the output type; the remaining bits are reinterpreted as the output type. This can change the value completely, even turn a negative number into a positive number or vice versa. This conversion is very fast, but should be used with care. -- `SaturatingFrom`/`SaturatingInto`: Like the `Truncating*` traits, these traits are used for lossy integer conversions. They check if the input fits into the output type, and if not, the closest possible value is used instead. For example, converting `258` to a `u8` with this strategy results in `255`, which is the highest `u8`. +- `saturate()`: Like `truncate()`, this is used for **lossy integer conversions**. It checks if the input fits into the output type, and if not, the closest possible value is used instead. For example, converting `258` to a `u8` with this method results in `255`, which is the highest `u8`. -- `LossyFrom`/`LossyInto`: These traits cover conversions involving floats (`f32` and `f64`). The converted value may be both rounded and saturated. When converting from a float to an integer, `NaN` is converted to `0`. +- `approx()`: This must be used when the input or output type is a **float** (`f32` and `f64`). The value may be both rounded and saturated. When converting from a float to an integer, `NaN` is turned into `0`. -Although the `as` operator can also be used for _truncating_ and _lossy_ numeric conversions, this is discouraged and will be deprecated in the future. The `cast_lossy` lint warns against this, and will become an error in a future edition. +Although the `as` operator can also be used instead of `truncate()` or `approx()`, this is discouraged and will be deprecated in the future. The `cast_lossy` lint warns against this, and will become an error in a future edition. + +## Conversion traits + +`into()` and `try_into()` are trait methods from the `Into`/`TryInto` traits, and exist on many types besides numbers. On the other hand, `truncate()`, `saturate()`, and `approx()` are inherent methods that only exist on numeric types in the standard library. They are generic and can convert into any type implementing any of these traits: + +- `TruncatingFrom` for `truncate()` +- `SaturatingFrom` for `saturate()` +- `ApproxFrom` for `approx()` ## Examples @@ -55,27 +65,27 @@ i16::from(42_u8) == 42 i8::try_from( 42_u8) == Ok(42) i8::try_from(-24_u8).is_err() -i8::truncating_from( 42_u8) == 42 -u8::truncating_from(-24_i8) == 232 // u8::MAX + 1 - 24 -i8::truncating_from(232_u8) == -24 // u8::MAX + 1 - 232 -u8::truncating_from(280_i16) == 24 // 280 % u8::MAX = 24 + 42_u8.truncate::() == 42 +-24_i8.truncate::() == 232 // u8::MAX + 1 - 24 +232_u8.truncate::() == -24 // u8::MAX + 1 - 232 +280_i16.truncate::() == 24 // 280 % u8::MAX = 24 -u8::saturating_from( 42_i8) == 42 -u8::saturating_from(-14_i8) == 0 // u8::MIN -i8::saturating_from(169_u8) == 127 // i8::MAX -u8::saturating_from(280_i16) == 255 // u8::MAX + 42_i8.saturate::() == 42 +-14_i8.saturate::() == 0 // u8::MIN +169_u8.saturate::() == 127 // i8::MAX +280_i16.saturate::() == 255 // u8::MAX -f32::lossy_from(42_i32) == 42.0 -f32::lossy_from(1073741827_i32) == 1073741800.0 // rounded -i32::lossy_from(f32::PI) == 3 // rounded -i32::lossy_from(f32::INFINITY) == i32::MAX // saturated + 42_i32.approx::() == 42.0 +1073741827_i32.approx::() == 1073741800.0 // rounded + f32::PI.approx::() == 3 // rounded + f32::INFINITY.approx::() == i32::MAX // saturated ``` ## How does this impact writing code? -This way of doing conversions is more verbose than in other languages. However, it is also very flexible, since you can choose _how_ a value should be converted. And since the behavior is explicit, you can't choose a truncating conversion instead of a lossless one by accident. Method names such as `truncating_from` alert the reader to the possibility of a bug. +This way of doing conversions is more verbose than in other languages. However, it is also very flexible, since you can choose _how_ a value should be converted. And since the behavior is explicit, you can't choose a truncating conversion instead of a lossless one by accident. Method names such as `truncate` alert the reader to the possibility of a bug. -The `as` operator can be used instead of `truncating_from` or `lossy_from`. However, this is discouraged, and will become a warning and then an error in the future. `as` does not guard against logic bugs, and may even encourage sloppy code. That's why it should no longer be used for conversions between numbers. +The `as` operator can be used instead of `truncate` or `approx`. However, this is discouraged, and will become a warning and then an error in the future. `as` does not guard against logic bugs, and may even encourage sloppy code. That's why it should no longer be used for conversions between numbers. # Reference-level explanation @@ -98,29 +108,27 @@ To make potential errors explicit, we can distinguish between these numeric erro 3. **Saturation**: If the number is too high or too low, the closest possible number is selected 4. **Precision loss**: The number is rounded, resulting in fewer significant digits -Truncation and Wrapping often occur together; for example, an `i32 → u16` conversion can both truncate and wrap around. To keep the complexity to a minimum, we treat wrapping as a special case of truncation, so we arrive at the following 6 new traits: +Truncation and Wrapping often occur together; for example, an `i32 → u16` conversion can both truncate and wrap around. To keep the complexity to a minimum, we treat wrapping as a special case of truncation, so we arrive at the following 3 new traits: -- `Truncating{From,Into}` — truncating conversions between integers -- `Saturating{From,Into}` — saturating conversions between integers -- `Lossy{From,Into}` — lossy conversions that involve floats - -Note that the word "lossy" means any conversion that doesn't preserve the input value (including truncation and saturation), but the `Lossy*` traits have a narrower scope. +- `TruncatingFrom` — truncating conversions between integers +- `SaturatingFrom` — saturating conversions between integers +- `ApproxFrom` — lossy conversions that involve floats ```rust -pub trait TruncatingFrom { +trait TruncatingFrom { fn truncating_from(value: T) -> Self; } -pub trait SaturatingFrom { +trait SaturatingFrom { fn saturating_from(value: T) -> Self; } -pub trait LossyFrom { - fn lossy_from(value: T) -> Self; +trait ApproxFrom { + fn approx_from(value: T) -> Self; } ``` -`TruncatingFrom` and `LossyFrom` can be implemented in the standard library using `as` by silencing the lint. For example: +`TruncatingFrom` and `ApproxFrom` can be implemented in the standard library using `as` by silencing the lint. For example: ```rust #![allow(cast_lossy)] @@ -131,8 +139,8 @@ impl TruncatingFrom for i8 { } } -impl LossyFrom for f32 { - fn lossy_from(value: f64) -> f32 { +impl ApproxFrom for f32 { + fn approx_from(value: f64) -> f32 { value as f32 } } @@ -156,42 +164,35 @@ impl SaturatingFrom for i8 { } ``` -The `*Into` traits are implemented with blanket implementations: +## Inherent methods + +[inherent-methods]: #inherent-methods + +Inherent methods similar to [`str::parse`](https://doc.rust-lang.org/std/primitive.str.html#method.parse) are added to make usage more ergonomic, e.g. ```rust -impl TruncatingInto for T -where - U: TruncatingFrom, -{ - fn truncating_into(self) -> U { - U::truncating_from(self) +impl i32 { + pub fn truncate>(self) -> T { + T::truncating_from(self) } -} -impl SaturatingInto for T -where - U: SaturatingFrom, -{ - fn saturating_into(self) -> U { - U::saturating_from(self) + pub fn saturate>(self) -> T { + T::saturating_from(self) } -} -impl LossyInto for T -where - U: LossyFrom, -{ - fn lossy_into(self) -> U { - U::lossy_from(self) + pub fn approx>(self) -> T { + T::approx_from(self) } } ``` -The traits will be added to the standard library prelude in a future edition. +This has several benefits. Unlike `value.truncating_into()` it allows specifying the output type, and unlike `T::truncating_from(value)`, it is chainable. Furthermore, inherent methods are always in scope and don't require importing a trait. + +## List of conversions This list of conversions should be implemented: -- `Truncating*` and `Saturating*`: +- `TruncatingFrom` and `SaturatingFrom`: - all **signed** to **unsigned** integers - all **signed** to **smaller signed** integers (e.g. `i16 → i8`) - all **unsigned** to **smaller or equal-sized** integers (e.g. `u32 → u16` or `u32 → i32`) @@ -200,7 +201,7 @@ This list of conversions should be implemented: - `u16`, `u32`, `u64`, `u128`, `i32`, `i64`, `i128` into `isize` - `isize` into `usize`, `u8`, `u16`, `u32`, `u64`, `u128`, `i8`, `i16`, `i32`, `i64` - `usize` into `isize`, `u8`, `u16`, `u32`, `u64`, `i8`, `i16`, `i32`, `i64`, `i128` -- `Lossy*`: +- `ApproxFrom`: - `f64` into `f32` - `f64`/`f32` into any integer - any integer with more than 32 bits (including `isize`/`usize`) into `f64` @@ -216,45 +217,45 @@ This lint is allow-by-default, and can be enabled with `#[warn(cast_lossy)]`. Th [drawbacks]: #drawbacks -1. The API surface of this change is pretty big: It has 3 new traits and over 200 impls. These make the documentation for primitive types less clear. - -2. The traits make the language more complex, as there is one more thing to learn. +1. The API surface of this change is rather big: Most numeric types get 3 new methods. -3. When the traits are added to the default prelude, more things are implicitly in scope. +2. This may change the overall character of the language. -4. This may change the overall character of the language. However, I believe it would make the language feel more consistent, since Rust already leans towards explicitness in most other situations. + However, I believe it would make the language feel more consistent, since Rust already leans towards explicitness in most other situations. -5. This may negatively impact compile times _(to be verified)_. +3. This may negatively impact compile times _(to be verified)_. # Rationale and alternatives [rationale-and-alternatives]: #rationale-and-alternatives -1. The `Saturating*` traits aren't needed to deprecate `as` for lossy numeric conversions, so we could add only the `Truncating*` and `Lossy*` traits. +First, I'd like to compare this with integer arithmetic. An expression like `x + 200_u8` can overflow, which is implicit, similarly to integer casts. However, there are some crucial differences: - However, the standard library already contains saturating math operations, so adding saturating conversions makes sense. +- Arithmetic is more common than conversions, so there is a bigger need to be concise +- For arithmetic, there already exist checked, truncating and wrapping methods. For example, you can write `200_u8.saturating_add(x)` if you want; an equivalent for conversions does not exist. +- By default, overflow on arithmetic operations wraps around in release builds and panics in debug builds. This means that bugs due to integer overflow can be caught with tests. Integer conversions on the other hand are _always_ unchecked. This means that bugs due to truncation are easier to miss. -2. We could require to import the traits explicitly, instead of putting them in the standard library prelude. +Therefore, I believe that making lossy conversions more explicit would go a long way toward avoiding bugs, making you more confident that your code is correct, and saving you time debugging and writing tests. But is the proposal outlined above the best possible solution? Let's look at a few alternatives: - However, I believe that not having the traits in scope by default would make the feature much less ergonomic. `TryFrom`/`TryInto` were added to the standard library prelude for the same reason. +1. We could add the conversion methods and the `cast_lossy` lint, but never turn it into an error. It would remain a warning, which people are free to ignore or disable. This makes sense if avoiding the `as` operator is seen more as a stylistic preference than a correctness issue. - **NOTE**: [Future possibilites: Inherent methods][inherent-methods] describes a solution that doesn't require changing the prelude. +2. `saturate()` isn't needed to deprecate `as` for lossy numeric conversions, so we could add only `truncate()` and `approx()`. + + However, the standard library already contains saturating math operations, so adding saturating conversions makes sense. 3. Instead of deprecating `as` only for lossy numeric casts, it could be deprecated for all numeric casts, so `From`/`Into` is required in these situations. This feels like overkill. If people really want to forbid `as` for lossless conversions, they can use clippy's `cast_lossless` lint. -4. Instead of adding traits, the conversions could be added as inherent methods. - - However, then the output type must be part of the name, so there would be `i32::saturating_into_i16()`, `i32::saturating_into_i8()`, and so on. I prefer the comparatively shorter `i32::saturating_into()`. +4. The `approx()` method could have a more descriptive name. Likewise, `truncate()` isn't ideal since it sometimes wraps around. I am open to better suggestions (though bear in mind that having multiple methods, like `truncate()`, `wrap()` and `truncate_and_wrap()` will make the feature more complicated and harder to learn). -5. The `Lossy*` traits could have a more descriptive name, since the term "lossy" seems to include truncation and saturation. The only name I could find that kind of describes the behavior of `LossyFrom` is `Approximate` +5. `truncate` and `saturate` could be abbreviated as `trunc` and `sat`. This would make it more concise, which is appealing to people who convert between numbers a lot. -6. The traits could be implemented in an external crate, but then the traits couldn't be added to the standard library prelude. Furthermore, to deprecate `as` for numeric conversions, the APIs to replace it should be available in the standard library, so they can be recommended in compiler warnings/errors. +6. This could be implemented in an external crate as extension traits, but then the traits must be imported everywhere they are used. Furthermore, to deprecate `as` for numeric conversions, the APIs to replace it should be available in the standard library, so they can be recommended in compiler warnings/errors. 7. Of course we could do nothing about this. Rust's increasing popularity means that this change would impact millions of developers, so we should be sure that the benefits justify the churn. This feature isn't _required_; Rust has worked well until now without it, and Rustaceans have learned to be extra careful when using `as` for numeric conversions. - However, I am convinced that removing this papercut will make Rust safer and prevent more bugs. + However, I am convinced that removing this papercut will make Rust safer and prevent more bugs. This is similar in spirit to the `unsafe` keyword, which makes Rust more verbose, but also more explicit about potential problems. # Prior art @@ -269,57 +270,22 @@ For the proposed lint, there exists prior art in clippy: - `cast_precision_loss` - `cast_sign_loss` -These lints show that lossy numeric casts can pose enough of a problem to forbid them, even though there is currently no alternative. Another data point is , which received +These lints show that lossy numeric casts can pose enough of a problem to forbid them, even though there is currently no alternative in the cases where truncation/saturation/rounding is desired. # Unresolved questions [unresolved-questions]: #unresolved-questions -- Are there better trait and method names? +- Are there better method names? + - Does this impact compile times? -- Should the traits remain perma-unstable, so they can be used, but not implemented outside of the standard library? + +- Should the traits be private? Or if not, should they remain perma-unstable, so they can not implemented outside the standard library? # Future possibilities [future-possibilities]: #future-possibilities -## Inherent methods - -[inherent-methods]: #inherent-methods - -Inherent methods similar to [`str::parse`](https://doc.rust-lang.org/std/primitive.str.html#method.parse) can be added to make usage more ergonomic, e.g. - -```rust -impl i32 { - pub fn truncate>(self) -> T { - T::truncating_from(self) - } - - pub fn saturate>(self) -> T { - T::saturating_from(self) - } - - pub fn approx>(self) -> T { - T::lossy_from(self) - } -} -``` - -Usage: - -```rust -value.truncate::() -// instead of -u8::truncating_from(value) -``` - -Benefits are: - -- it is shorter -- unlike `value.truncating_into()` it allows specifying the output type -- unlike `T::truncating_from(value)`, it is chainable -- it doesn't require an import, so the proposed traits don't need to be added to the standard library prelude - ## NonZero types Conversions could also be implemented for `NonZero{U,I}{8,16,32,64,128}`. From 2347d757e2c0fd268638434ee3619aba0bdff4aa Mon Sep 17 00:00:00 2001 From: Ludwig Stecher Date: Sun, 16 Apr 2023 00:18:51 +0200 Subject: [PATCH 4/7] add metadata --- 0000-lossy-conversions.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/0000-lossy-conversions.md b/0000-lossy-conversions.md index f95e658fea0..72a060baed0 100644 --- a/0000-lossy-conversions.md +++ b/0000-lossy-conversions.md @@ -1,7 +1,7 @@ - Feature Name: `lossy_conversions` -- Start Date: (fill me in with today's date, YYYY-MM-DD) -- RFC PR: [rust-lang/rfcs#0000](https://github.com/rust-lang/rfcs/pull/0000) -- Rust Issue: [rust-lang/rust#0000](https://github.com/rust-lang/rust/issues/0000) +- Start Date: 2023-04-14 +- RFC PR: [rust-lang/rfcs#3415](https://github.com/rust-lang/rfcs/pull/3415) +- Rust Issue: TBD # Summary From 7d7735d1e0681830a59bfa98c8710ecfc37b21ae Mon Sep 17 00:00:00 2001 From: Ludwig Stecher Date: Sun, 16 Apr 2023 00:56:46 +0200 Subject: [PATCH 5/7] Talk about "conciseness" drawback --- 0000-lossy-conversions.md | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/0000-lossy-conversions.md b/0000-lossy-conversions.md index 72a060baed0..cd26473dd24 100644 --- a/0000-lossy-conversions.md +++ b/0000-lossy-conversions.md @@ -217,13 +217,17 @@ This lint is allow-by-default, and can be enabled with `#[warn(cast_lossy)]`. Th [drawbacks]: #drawbacks -1. The API surface of this change is rather big: Most numeric types get 3 new methods. +1. This makes code more verbose. -2. This may change the overall character of the language. + I do not think that this is a deal-breaker. Rustaceans have come to accept that you need `.unwrap()` to access an optional value, and `Box::new()` to allocate heap memory: things that many popular languages do automatically. But since Rust had a more concise way of converting integers, and may now abandon it, people might be unhappy because they will have to change their coding habits. Furthermore, the new way isn't _obviously_ better than the old one in every way. Probably only those who have had to deal with integer truncation bugs will fully appreciate this change. + +2. The API surface of this change is rather big: Most numeric types get 3 new methods. + +3. This may change the overall character of the language. However, I believe it would make the language feel more consistent, since Rust already leans towards explicitness in most other situations. -3. This may negatively impact compile times _(to be verified)_. +4. This may negatively impact compile times _(to be verified)_. # Rationale and alternatives From 821dc26c5f63dc09137ea775159fc88b74064ef7 Mon Sep 17 00:00:00 2001 From: Ludwig Stecher Date: Sun, 16 Apr 2023 01:12:29 +0200 Subject: [PATCH 6/7] Present another alternative: Deprecate `as` only for int-to-int casts --- 0000-lossy-conversions.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/0000-lossy-conversions.md b/0000-lossy-conversions.md index cd26473dd24..7f5ac2631f9 100644 --- a/0000-lossy-conversions.md +++ b/0000-lossy-conversions.md @@ -257,9 +257,11 @@ Therefore, I believe that making lossy conversions more explicit would go a long 6. This could be implemented in an external crate as extension traits, but then the traits must be imported everywhere they are used. Furthermore, to deprecate `as` for numeric conversions, the APIs to replace it should be available in the standard library, so they can be recommended in compiler warnings/errors. -7. Of course we could do nothing about this. Rust's increasing popularity means that this change would impact millions of developers, so we should be sure that the benefits justify the churn. This feature isn't _required_; Rust has worked well until now without it, and Rustaceans have learned to be extra careful when using `as` for numeric conversions. +7. Another option is to deprecate `as` only for lossy integer-to-integer casts. From what I understand, conversions involving floats are more common, and the implied rounding behaviour is usually desired. Having to spell `.approx()` instead of ` as _` is not a huge deal, but the ecosystem migration may be considered too much of a hassle. - However, I am convinced that removing this papercut will make Rust safer and prevent more bugs. This is similar in spirit to the `unsafe` keyword, which makes Rust more verbose, but also more explicit about potential problems. +8. Of course we could do nothing about this. Rust's increasing popularity means that this change would impact millions of developers, so we should be sure that the benefits justify the churn. This feature isn't _required_; Rust has worked well until now without it, and Rustaceans have learned to be extra careful when using `as` for numeric conversions. + + However, I am convinced that removing (or at least reducing) this papercut will make Rust safer and prevent more bugs. This is similar in spirit to the `unsafe` keyword, which makes Rust more verbose, but also more explicit about potential problems. # Prior art From 8281c7d22630d0daf11909a987c902b15bd3c99d Mon Sep 17 00:00:00 2001 From: Ludwig Stecher Date: Sun, 16 Apr 2023 17:53:08 +0200 Subject: [PATCH 7/7] Explain truncation more, define traits as unstable --- 0000-lossy-conversions.md | 32 ++++++++++++++++++-------------- 1 file changed, 18 insertions(+), 14 deletions(-) diff --git a/0000-lossy-conversions.md b/0000-lossy-conversions.md index 7f5ac2631f9..33adf8b0d2b 100644 --- a/0000-lossy-conversions.md +++ b/0000-lossy-conversions.md @@ -41,6 +41,8 @@ Conversions between number types can sometimes be lossy. For example, when conve - `truncate()`: Used for **lossy integer conversions**. This truncates leading bits that don't fit into the output type; the remaining bits are reinterpreted as the output type. This can change the value completely, even turn a negative number into a positive number or vice versa. This conversion is very fast, but should be used with care. + In mathematical terms, this returns the only result for _value_ (mod 2_n_) that lies in the output type's range, where _n_ is the output type's number of bits. + - `saturate()`: Like `truncate()`, this is used for **lossy integer conversions**. It checks if the input fits into the output type, and if not, the closest possible value is used instead. For example, converting `258` to a `u8` with this method results in `255`, which is the highest `u8`. - `approx()`: This must be used when the input or output type is a **float** (`f32` and `f64`). The value may be both rounded and saturated. When converting from a float to an integer, `NaN` is turned into `0`. @@ -49,11 +51,7 @@ Although the `as` operator can also be used instead of `truncate()` or `approx() ## Conversion traits -`into()` and `try_into()` are trait methods from the `Into`/`TryInto` traits, and exist on many types besides numbers. On the other hand, `truncate()`, `saturate()`, and `approx()` are inherent methods that only exist on numeric types in the standard library. They are generic and can convert into any type implementing any of these traits: - -- `TruncatingFrom` for `truncate()` -- `SaturatingFrom` for `saturate()` -- `ApproxFrom` for `approx()` +`into()` and `try_into()` are trait methods from the `Into`/`TryInto` traits, and exist on many types besides numbers. On the other hand, `truncate()`, `saturate()`, and `approx()` are inherent methods that only exist on numeric types in the standard library. Their traits are unstable for now, so they ca not be implemented for custom types. ## Examples @@ -66,9 +64,9 @@ i8::try_from( 42_u8) == Ok(42) i8::try_from(-24_u8).is_err() 42_u8.truncate::() == 42 --24_i8.truncate::() == 232 // u8::MAX + 1 - 24 -232_u8.truncate::() == -24 // u8::MAX + 1 - 232 -280_i16.truncate::() == 24 // 280 % u8::MAX = 24 +-24_i8.truncate::() == 232 // 2⁸ - 24 +232_u8.truncate::() == -24 // 232 - 2⁸ +536_i16.truncate::() == 24 // 536 mod 2⁸ 42_i8.saturate::() == 42 -14_i8.saturate::() == 0 // u8::MIN @@ -110,20 +108,20 @@ To make potential errors explicit, we can distinguish between these numeric erro Truncation and Wrapping often occur together; for example, an `i32 → u16` conversion can both truncate and wrap around. To keep the complexity to a minimum, we treat wrapping as a special case of truncation, so we arrive at the following 3 new traits: -- `TruncatingFrom` — truncating conversions between integers -- `SaturatingFrom` — saturating conversions between integers -- `ApproxFrom` — lossy conversions that involve floats +- `TruncatingFrom` — truncating conversions between integers +- `SaturatingFrom` — saturating conversions between integers +- `ApproxFrom` — lossy conversions that involve floats ```rust -trait TruncatingFrom { +pub trait TruncatingFrom { fn truncating_from(value: T) -> Self; } -trait SaturatingFrom { +pub trait SaturatingFrom { fn saturating_from(value: T) -> Self; } -trait ApproxFrom { +pub trait ApproxFrom { fn approx_from(value: T) -> Self; } ``` @@ -164,6 +162,8 @@ impl SaturatingFrom for i8 { } ``` +These traits are **unstable** for now. Before stabilizing them, we should consider adding `*Into` traits as well, but that discussion is left for the future. + ## Inherent methods [inherent-methods]: #inherent-methods @@ -269,6 +269,8 @@ Therefore, I believe that making lossy conversions more explicit would go a long This proposal was previously discussed in [this internals thread](https://internals.rust-lang.org/t/lets-deprecate-as-for-lossy-numeric-casts/16283). +I'm not aware of a language with explicit integer or float casting methods that distinguish between different numerical errors. + For the proposed lint, there exists prior art in clippy: - `cast_possible_truncation` @@ -278,6 +280,8 @@ For the proposed lint, there exists prior art in clippy: These lints show that lossy numeric casts can pose enough of a problem to forbid them, even though there is currently no alternative in the cases where truncation/saturation/rounding is desired. +API-wise, the most similar features are the [`FromIterator`](https://doc.rust-lang.org/std/iter/trait.FromIterator.html)/[`IntoIterator`](https://doc.rust-lang.org/std/iter/trait.IntoIterator.html) traits used by [`collect()`](https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.collect), and the [`FromStr`](https://doc.rust-lang.org/std/str/trait.FromStr.html) trait used by [`parse()`](https://doc.rust-lang.org/std/primitive.str.html#method.parse). + # Unresolved questions [unresolved-questions]: #unresolved-questions