From 288b1b522b048f821b447ce496e38f1276615a74 Mon Sep 17 00:00:00 2001 From: David Turner Date: Sun, 15 Mar 2015 15:52:26 -0400 Subject: [PATCH 01/10] RFC for read_all --- text/0000-read-all.md | 50 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) create mode 100644 text/0000-read-all.md diff --git a/text/0000-read-all.md b/text/0000-read-all.md new file mode 100644 index 00000000000..47c82a104a8 --- /dev/null +++ b/text/0000-read-all.md @@ -0,0 +1,50 @@ +- Feature Name: read_all +- Start Date: 2015-03-15 +- RFC PR: (leave this empty) +- Rust Issue: (leave this empty) + +# Summary + +Rust's Write trait has write_all, which attempts to write an entire +buffer. This proposal adds read_all, which attempts to read a fixed +number of bytes into a given buffer. + +# Motivation + +The new read_all method will allow programs to read from disk without +having to write their own read loops. Most Rust programs which need +to read from disk will prefer this to the plain read function. Many C +programs have the same need, and solve it the same way (e.g. git has +read_in_full). Here's one example of a Rust library doing this: +https://github.com/BurntSushi/byteorder/blob/master/src/new.rs#L184 + +# Detailed design + +The read_all function will take a mutable, borrowed slice of u8 to +read into, and will attempt to fill that entire slice with data. + +It will loop, calling read() once per iteration and attempting to read +the remaining amount of data. If read returns EINTR, the loop will +retry. If there are no more bytes to read (as signalled by a return +of Ok(0) from read()), a new error type, ErrorKind::ReadZero, will be +returned. In the event of another error, that error will be +returned. After a read call returns having successfully read some +bytes, the total number of bytes read will be updated. If that +total is equal to the size of the buffer, read will return +successfully. + +# Drawbacks + +The major weakness of this API (shared with write_all) is that in the +event of an error, there is no way to return the number of bytes that +were successfully read before the error. But since that is the design +of write_all, it makes sense to mimic that design decision for read_all. + +# Alternatives + +One alternative design would return some new kind of Result which +could report the number of bytes sucessfully read before an error. +This would be inconsistent with write_all, but arguably more correct. + +Or we could leave this out, and let every Rust user write their own +read_all function -- like savages. From 6566a1d5a17e4e38dd2c176b7325c599d34f0d93 Mon Sep 17 00:00:00 2001 From: David Turner Date: Wed, 18 Mar 2015 00:52:52 -0400 Subject: [PATCH 02/10] address review comments --- text/0000-read-all.md | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/text/0000-read-all.md b/text/0000-read-all.md index 47c82a104a8..c676ec747e7 100644 --- a/text/0000-read-all.md +++ b/text/0000-read-all.md @@ -26,7 +26,7 @@ read into, and will attempt to fill that entire slice with data. It will loop, calling read() once per iteration and attempting to read the remaining amount of data. If read returns EINTR, the loop will retry. If there are no more bytes to read (as signalled by a return -of Ok(0) from read()), a new error type, ErrorKind::ReadZero, will be +of Ok(0) from read()), a new error type, ErrorKind::ShortRead, will be returned. In the event of another error, that error will be returned. After a read call returns having successfully read some bytes, the total number of bytes read will be updated. If that @@ -46,5 +46,14 @@ One alternative design would return some new kind of Result which could report the number of bytes sucessfully read before an error. This would be inconsistent with write_all, but arguably more correct. +Another would be that ErrorKind::ShortRead would be parameterized by +the number of bytes read before EOF. The downside of this is that it +bloats the size of io::Error. + +Finally, in the event of a short read, we could return Ok(number of +bytes read before EOF) instead of an error. But then every user would +have to check for this case. And it would be inconsistent with +write_all. + Or we could leave this out, and let every Rust user write their own read_all function -- like savages. From 1e457ade4b98336f6d29a437ffc8856343793516 Mon Sep 17 00:00:00 2001 From: David Turner Date: Thu, 19 Mar 2015 11:55:27 -0400 Subject: [PATCH 03/10] Address review feedback --- text/0000-read-all.md | 16 ++++++++-------- 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/text/0000-read-all.md b/text/0000-read-all.md index c676ec747e7..28259dfb679 100644 --- a/text/0000-read-all.md +++ b/text/0000-read-all.md @@ -26,12 +26,12 @@ read into, and will attempt to fill that entire slice with data. It will loop, calling read() once per iteration and attempting to read the remaining amount of data. If read returns EINTR, the loop will retry. If there are no more bytes to read (as signalled by a return -of Ok(0) from read()), a new error type, ErrorKind::ShortRead, will be -returned. In the event of another error, that error will be +of Ok(0) from read()), a new error type, ErrorKind::ShortRead(usize), +will be returned. ShortRead includes the number of bytes successfully +read. In the event of another error, that error will be returned. After a read call returns having successfully read some -bytes, the total number of bytes read will be updated. If that -total is equal to the size of the buffer, read will return -successfully. +bytes, the total number of bytes read will be updated. If that total +is equal to the size of the buffer, read will return successfully. # Drawbacks @@ -46,9 +46,9 @@ One alternative design would return some new kind of Result which could report the number of bytes sucessfully read before an error. This would be inconsistent with write_all, but arguably more correct. -Another would be that ErrorKind::ShortRead would be parameterized by -the number of bytes read before EOF. The downside of this is that it -bloats the size of io::Error. +If we wanted io::Error to be a smaller type, ErrorKind::ShortRead +could be unparameterized. But this would reduce the information +available to calleres. Finally, in the event of a short read, we could return Ok(number of bytes read before EOF) instead of an error. But then every user would From c7f633b78159b9c9140e2831fb83ea6fe8e2ad7d Mon Sep 17 00:00:00 2001 From: David Turner Date: Thu, 19 Mar 2015 23:25:35 -0400 Subject: [PATCH 04/10] fix wrong statement --- text/0000-read-all.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/text/0000-read-all.md b/text/0000-read-all.md index 28259dfb679..2aebad18ca9 100644 --- a/text/0000-read-all.md +++ b/text/0000-read-all.md @@ -46,7 +46,7 @@ One alternative design would return some new kind of Result which could report the number of bytes sucessfully read before an error. This would be inconsistent with write_all, but arguably more correct. -If we wanted io::Error to be a smaller type, ErrorKind::ShortRead +If we wanted io::ErrorKind to be a smaller type, ErrorKind::ShortRead could be unparameterized. But this would reduce the information available to calleres. From d5284eb212c836ddf307722d10a694cc21b112f9 Mon Sep 17 00:00:00 2001 From: David Turner Date: Wed, 17 Jun 2015 12:13:37 -0400 Subject: [PATCH 05/10] read_exact and read_full --- text/0000-read-all.md | 57 ++++++++++++++++++++++++------------------- 1 file changed, 32 insertions(+), 25 deletions(-) diff --git a/text/0000-read-all.md b/text/0000-read-all.md index 2aebad18ca9..606b644dee5 100644 --- a/text/0000-read-all.md +++ b/text/0000-read-all.md @@ -1,4 +1,4 @@ -- Feature Name: read_all +- Feature Name: read_exact and read_full - Start Date: 2015-03-15 - RFC PR: (leave this empty) - Rust Issue: (leave this empty) @@ -6,54 +6,61 @@ # Summary Rust's Write trait has write_all, which attempts to write an entire -buffer. This proposal adds read_all, which attempts to read a fixed -number of bytes into a given buffer. +buffer. This proposal adds two new methods, read_full and read_exact. +read_full attempts to read a fixed number of bytes into a given +buffer, and returns Ok(n) if it succeeds or in the event of EOF. +read_exact attempts to read a fixed number of bytes into a given +buffer, and returns Ok(n) if it succeeds and Err(ErrorKind::ShortRead) +if it fails. # Motivation -The new read_all method will allow programs to read from disk without -having to write their own read loops. Most Rust programs which need -to read from disk will prefer this to the plain read function. Many C -programs have the same need, and solve it the same way (e.g. git has -read_in_full). Here's one example of a Rust library doing this: +The new read_exact method will allow programs to read from disk +without having to write their own read loops to handle EINTR. Most +Rust programs which need to read from disk will prefer this to the +plain read function. Many C programs have the same need, and solve it +the same way (e.g. git has read_in_full). Here's one example of a +Rust library doing this: https://github.com/BurntSushi/byteorder/blob/master/src/new.rs#L184 +The read_full method is useful the common case of implementing +buffered reads from a file or socket. In this case, a short read due +to EOF is an expected outcome, and the caller must check the number of +bytes returned. + # Detailed design -The read_all function will take a mutable, borrowed slice of u8 to +The read_full function will take a mutable, borrowed slice of u8 to read into, and will attempt to fill that entire slice with data. It will loop, calling read() once per iteration and attempting to read the remaining amount of data. If read returns EINTR, the loop will retry. If there are no more bytes to read (as signalled by a return -of Ok(0) from read()), a new error type, ErrorKind::ShortRead(usize), -will be returned. ShortRead includes the number of bytes successfully -read. In the event of another error, that error will be +of Ok(0) from read()), the number of bytes read so far +will be returned. In the event of another error, that error will be returned. After a read call returns having successfully read some bytes, the total number of bytes read will be updated. If that total -is equal to the size of the buffer, read will return successfully. +is equal to the size of the buffer, read_full will return successfully. + +The read_exact method can be implemented in terms of read_full. # Drawbacks The major weakness of this API (shared with write_all) is that in the event of an error, there is no way to return the number of bytes that -were successfully read before the error. But since that is the design -of write_all, it makes sense to mimic that design decision for read_all. +were successfully read before the error. But returning that data +would require a much more complicated return type, as well as +requiring more work on the part of callers. # Alternatives One alternative design would return some new kind of Result which could report the number of bytes sucessfully read before an error. -This would be inconsistent with write_all, but arguably more correct. - -If we wanted io::ErrorKind to be a smaller type, ErrorKind::ShortRead -could be unparameterized. But this would reduce the information -available to calleres. -Finally, in the event of a short read, we could return Ok(number of -bytes read before EOF) instead of an error. But then every user would -have to check for this case. And it would be inconsistent with -write_all. +If we wanted one method instead of two, ErrorKind::ShortRead could be +parameterized with the number of bytes read before EOF. But this +would increase the size of ErrorKind. Or we could leave this out, and let every Rust user write their own -read_all function -- like savages. +read_full or read_exact function, or import a crate of stuff just for +this one function. From f65d966b3bbac8705d0962a663687478c37a2378 Mon Sep 17 00:00:00 2001 From: rkjnsn Date: Wed, 17 Jun 2015 13:14:11 -0700 Subject: [PATCH 06/10] Modify read_full/read_exact RFC Expand detailed design to include signatures and default definitions, and make a number of other modifications. --- text/0000-read-all.md | 119 +++++++++++++++++++++++++++--------------- 1 file changed, 76 insertions(+), 43 deletions(-) diff --git a/text/0000-read-all.md b/text/0000-read-all.md index 606b644dee5..e3686756335 100644 --- a/text/0000-read-all.md +++ b/text/0000-read-all.md @@ -5,62 +5,95 @@ # Summary -Rust's Write trait has write_all, which attempts to write an entire -buffer. This proposal adds two new methods, read_full and read_exact. -read_full attempts to read a fixed number of bytes into a given -buffer, and returns Ok(n) if it succeeds or in the event of EOF. -read_exact attempts to read a fixed number of bytes into a given -buffer, and returns Ok(n) if it succeeds and Err(ErrorKind::ShortRead) -if it fails. +Rust's `Write` trait has `write_all`, which is a convenience method that calls +`write` repeatedly to write an entire buffer. This proposal adds two similar +convenience methods to the `Read` trait: `read_full` and `read_exact`. +`read_full` calls `read` repeatedly until the buffer has been filled, EOF has +been reached, or an error other than `Interrupted` occurs. `read_exact` is +similar to `read_full`, except that reaching EOF before filling the buffer is +considered an error. # Motivation -The new read_exact method will allow programs to read from disk -without having to write their own read loops to handle EINTR. Most -Rust programs which need to read from disk will prefer this to the -plain read function. Many C programs have the same need, and solve it -the same way (e.g. git has read_in_full). Here's one example of a -Rust library doing this: -https://github.com/BurntSushi/byteorder/blob/master/src/new.rs#L184 - -The read_full method is useful the common case of implementing -buffered reads from a file or socket. In this case, a short read due -to EOF is an expected outcome, and the caller must check the number of -bytes returned. +The `read` method may return fewer bytes than requested, and may fail with an +`Interrupted` error if a signal is received during the call. This requires +programs wishing to fill a buffer to call `read` repeatedly in a loop. This is +a very common need, and it would be nice if this functionality were provided in +the standard library. Many C and Rust programs have the same need, and solve it +in the same way. For example, Git has [`read_in_full`][git], which behaves like +the proposed `read_full`, and the Rust byteorder crate has +[`read_full`][byteorder], which behaves like the proposed `read_exact`. +[git]: https://github.com/git/git/blob/16da57c7c6c1fe92b32645202dd19657a89dd67d/wrapper.c#L246 +[byteorder]: https://github.com/BurntSushi/byteorder/blob/2358ace61332e59f596c9006e1344c97295fdf72/src/new.rs#L184 # Detailed design -The read_full function will take a mutable, borrowed slice of u8 to -read into, and will attempt to fill that entire slice with data. +The following methods will be added to the `Read` trait: + +``` rust +fn read_full(&mut self, buf: &mut [u8]) -> Result; +fn read_exact(&mut self, buf: &mut [u8]) -> Result<()>; +``` + +Additionally, default implementations of these methods will be provided: + +``` rust +fn read_full(&mut self, mut buf: &mut [u8]) -> Result { + let mut read = 0; + while buf.len() > 0 { + match self.read(buf) { + Ok(0) => break, + Ok(n) => { read += n; let tmp = buf; buf = &mut tmp[n..]; } + Err(ref e) if e.kind() == ErrorKind::Interrupted => {} + Err(e) => return Err(e), + } + } + Ok(read) +} -It will loop, calling read() once per iteration and attempting to read -the remaining amount of data. If read returns EINTR, the loop will -retry. If there are no more bytes to read (as signalled by a return -of Ok(0) from read()), the number of bytes read so far -will be returned. In the event of another error, that error will be -returned. After a read call returns having successfully read some -bytes, the total number of bytes read will be updated. If that total -is equal to the size of the buffer, read_full will return successfully. +fn read_exact(&mut self, buf: &mut [u8]) -> Result<()> { + if try!(self.read_full(buf)) != buf.len() { + Err(Error::new(ErrorKind::UnexpectedEOF, "failed to fill whole buffer")) + } else { + Ok(()) + } +} +``` -The read_exact method can be implemented in terms of read_full. +Finally, a new `ErrorKind::UnexpectedEOF` will be introduced, which will be +returned by `read_exact` in the event of a premature EOF. # Drawbacks -The major weakness of this API (shared with write_all) is that in the -event of an error, there is no way to return the number of bytes that -were successfully read before the error. But returning that data -would require a much more complicated return type, as well as -requiring more work on the part of callers. +Like `write_all`, these APIs are lossy: in the event of an error, there is no +way to determine the number of bytes that were successfully read before the +error. However, doing so would complicate the methods, and the caller will want +to simply fail if an error occurs the vast majority of the time. Situations +that require lower level control can still use `read` directly. + +# Unanswered Questions + +Naming. Is `read_full` the best name? Should `UnexpectedEOF` instead be +`ShortRead` or `ReadZero`? # Alternatives -One alternative design would return some new kind of Result which -could report the number of bytes sucessfully read before an error. +Use a more complicated return type to allow callers to retrieve the number of +bytes successfully read before an error occurred. As explained above, this +would complicate the use of these methods for very little gain. It's worth +noting that git's `read_in_full` is similarly lossy, and just returns an error +even if some bytes have been read. + +Only provide `read_exact`, but parameterize the `UnexpectedEOF` or `ShortRead` +error kind with the number of bytes read to allow it to be used in place of +`read_full`. This would be less convenient to use in cases where EOF is not an +error. -If we wanted one method instead of two, ErrorKind::ShortRead could be -parameterized with the number of bytes read before EOF. But this -would increase the size of ErrorKind. +Only provide `read_full`. This would cover most of the convenience (callers +could avoid the read loop), but callers requiring a filled buffer would have to +manually check if all of the desired bytes were read. -Or we could leave this out, and let every Rust user write their own -read_full or read_exact function, or import a crate of stuff just for -this one function. +Finally, we could leave this out, and let every Rust user needing this +functionality continue to write their own `read_full` or `read_exact` function, +or have to track down an external crate just for one straightforward and +commonly used convenience method. From 585b289ec2db573ce995391c2563902a9fa49af1 Mon Sep 17 00:00:00 2001 From: Cesar Eduardo Barros Date: Mon, 13 Jul 2015 16:19:30 -0300 Subject: [PATCH 07/10] My proposal --- text/0000-read-all.md | 196 +++++++++++++++++++++++++++++------------- 1 file changed, 138 insertions(+), 58 deletions(-) diff --git a/text/0000-read-all.md b/text/0000-read-all.md index e3686756335..6f191524fe6 100644 --- a/text/0000-read-all.md +++ b/text/0000-read-all.md @@ -1,58 +1,74 @@ -- Feature Name: read_exact and read_full +- Feature Name: read_exact and ErrorKind::UnexpectedEOF - Start Date: 2015-03-15 - RFC PR: (leave this empty) - Rust Issue: (leave this empty) # Summary -Rust's `Write` trait has `write_all`, which is a convenience method that calls -`write` repeatedly to write an entire buffer. This proposal adds two similar -convenience methods to the `Read` trait: `read_full` and `read_exact`. -`read_full` calls `read` repeatedly until the buffer has been filled, EOF has -been reached, or an error other than `Interrupted` occurs. `read_exact` is -similar to `read_full`, except that reaching EOF before filling the buffer is -considered an error. +Rust's `Write` trait has the `write_all` method, which is a convenience +method that writes a whole buffer, failing with `ErrorKind::WriteZero` +if the buffer cannot be written in full. + +This RFC proposes adding its `Read` counterpart: a method (here called +`read_exact`) that reads a whole buffer, failing with an error (here +called `ErrorKind::UnexpectedEOF`) if the buffer cannot be read in full. # Motivation -The `read` method may return fewer bytes than requested, and may fail with an -`Interrupted` error if a signal is received during the call. This requires -programs wishing to fill a buffer to call `read` repeatedly in a loop. This is -a very common need, and it would be nice if this functionality were provided in -the standard library. Many C and Rust programs have the same need, and solve it -in the same way. For example, Git has [`read_in_full`][git], which behaves like -the proposed `read_full`, and the Rust byteorder crate has -[`read_full`][byteorder], which behaves like the proposed `read_exact`. -[git]: https://github.com/git/git/blob/16da57c7c6c1fe92b32645202dd19657a89dd67d/wrapper.c#L246 -[byteorder]: https://github.com/BurntSushi/byteorder/blob/2358ace61332e59f596c9006e1344c97295fdf72/src/new.rs#L184 +When dealing with serialization formats with fixed-length fields, +reading or writing less than the field's size is an error. For the +`Write` side, the `write_all` method does the job; for the `Read` side, +however, one has to call `read` in a loop until the buffer is completely +filled, or until a premature EOF is reached. + +This leads to a profusion of similar helper functions. For instance, the +`byteorder` crate has a `read_full` function, and the `postgres` crate +has a `read_all` function. However, their handling of the premature EOF +condition differs: the `byteorder` crate has its own `Error` enum, with +`UnexpectedEOF` and `Io` variants, while the `postgres` crate uses an +`io::Error` with an `io::ErrorKind::Other`. + +That can make it unnecessarily hard to mix uses of these helper +functions; for instance, if one wants to read a 20-byte tag (using one's +own helper function) followed by a big-endian integer, either the helper +function has to be written to use `byteorder::Error`, or the calling +code has to deal with two different ways to represent a premature EOF, +depending on which field encountered the EOF condition. + +Additionally, when reading from an in-memory buffer, looping is not +necessary; it can be replaced by a size comparison followed by a +`copy_memory` (similar to `write_all` for `&mut [u8]`). If this +non-looping implementation is `#[inline]`, and the buffer size is known +(for instance, it's a fixed-size buffer in the stack, or there was an +earlier check of the buffer size against a larger value), the compiler +could potentially turn a read from the buffer followed by an endianness +conversion into the native endianness (as can happen when using the +`byteorder` crate) into a single-instruction direct load from the buffer +into a register. # Detailed design -The following methods will be added to the `Read` trait: +First, a new variant `UnexpectedEOF` is added to the `io::ErrorKind` enum. + +The following method is added to the `Read` trait: ``` rust -fn read_full(&mut self, buf: &mut [u8]) -> Result; fn read_exact(&mut self, buf: &mut [u8]) -> Result<()>; ``` -Additionally, default implementations of these methods will be provided: +Aditionally, a default implementation of this method is provided: ``` rust -fn read_full(&mut self, mut buf: &mut [u8]) -> Result { - let mut read = 0; +fn read_exact(&mut self, mut buf: &mut [u8]) -> Result<()> { while buf.len() > 0 { match self.read(buf) { Ok(0) => break, - Ok(n) => { read += n; let tmp = buf; buf = &mut tmp[n..]; } + Ok(n) => { let tmp = buf; buf = &mut tmp[n..]; } Err(ref e) if e.kind() == ErrorKind::Interrupted => {} Err(e) => return Err(e), } } - Ok(read) -} - -fn read_exact(&mut self, buf: &mut [u8]) -> Result<()> { - if try!(self.read_full(buf)) != buf.len() { + if buf.len() > 0 { Err(Error::new(ErrorKind::UnexpectedEOF, "failed to fill whole buffer")) } else { Ok(()) @@ -60,40 +76,104 @@ fn read_exact(&mut self, buf: &mut [u8]) -> Result<()> { } ``` -Finally, a new `ErrorKind::UnexpectedEOF` will be introduced, which will be -returned by `read_exact` in the event of a premature EOF. +And an optimized implementation of this method for `&[u8]` is provided: + +```rust +#[inline] +fn read_exact(&mut self, buf: &mut [u8]) -> Result<()> { + if (buf.len() > self.len()) { + return Err(Error::new(ErrorKind::UnexpectedEOF, "failed to fill whole buffer")); + } + let (a, b) = self.split_at(buf.len()); + slice::bytes::copy_memory(a, buf); + *self = b; + Ok(()) +} +``` + +# Naming + +It's unfortunate that `write_all` used `WriteZero` for its `ErrorKind`; +were it named `UnexpectedEOF` (which is a much more intuitive name), the +same `ErrorKind` could be used for both functions. + +The initial proposal for this `read_exact` method called it `read_all`, +for symmetry with `write_all`. However, that name could also be +interpreted as "read as many bytes as you can that fit on this buffer, +and return what you could read" instead of "read enough bytes to fill +this buffer, and fail if you couldn't read them all". The previous +discussion led to `read_exact` for the later meaning, and `read_full` +for the former meaning. # Drawbacks -Like `write_all`, these APIs are lossy: in the event of an error, there is no -way to determine the number of bytes that were successfully read before the -error. However, doing so would complicate the methods, and the caller will want -to simply fail if an error occurs the vast majority of the time. Situations -that require lower level control can still use `read` directly. +If this method fails, the buffer contents are undefined; the +`read_exact' method might have partially overwritten it. If the caller +requires "all-or-nothing" semantics, it must clone the buffer. In most +use cases, this is not a problem; the caller will discard or overwrite +the buffer in case of failure. -# Unanswered Questions +In the same way, if this method fails, there is no way to determine how +many bytes were read before it determined it couldn't completely fill +the buffer. -Naming. Is `read_full` the best name? Should `UnexpectedEOF` instead be -`ShortRead` or `ReadZero`? +Situations that require lower level control can still use `read` +directly. # Alternatives -Use a more complicated return type to allow callers to retrieve the number of -bytes successfully read before an error occurred. As explained above, this -would complicate the use of these methods for very little gain. It's worth -noting that git's `read_in_full` is similarly lossy, and just returns an error -even if some bytes have been read. - -Only provide `read_exact`, but parameterize the `UnexpectedEOF` or `ShortRead` -error kind with the number of bytes read to allow it to be used in place of -`read_full`. This would be less convenient to use in cases where EOF is not an -error. - -Only provide `read_full`. This would cover most of the convenience (callers -could avoid the read loop), but callers requiring a filled buffer would have to -manually check if all of the desired bytes were read. - -Finally, we could leave this out, and let every Rust user needing this -functionality continue to write their own `read_full` or `read_exact` function, -or have to track down an external crate just for one straightforward and -commonly used convenience method. +The first alternative is to do nothing. Every Rust user needing this +functionality continues to write their own read_full or read_exact +function, or have to track down an external crate just for one +straightforward and commonly used convenience method. Additionally, +unless everybody uses the same external crate, every reimplementation of +this method will have slightly different error handling, complicating +mixing users of multiple copies of this convenience method. + +The second alternative is to just add the `ErrorKind::UnexpectedEOF` or +similar. This would lead in the long run to everybody using the same +error handling for their version of this convenience method, simplifying +mixing their uses. However, it's questionable to add an `ErrorKind` +variant which is never used by the standard library. + +Another alternative is to return the number of bytes read in the error +case. That makes the buffer contents defined also in the error case, at +the cost of increasing the size of the frequently-used `io::Error` +struct, for a rarely used return value. My objections to this +alternative are: + +* If the caller has an use for the partially written buffer contents, + then it's treating the "buffer partially filled" case as an + alternative success case, not as a failure case. This is not a good + match for the semantics of an `Err` return. +* Determining that the buffer cannot be completely filled can in some + cases be much faster than doing a partial copy. Many callers are not + going to be interested in an incomplete read, meaning that all the + work of filling the buffer is wasted. +* As mentioned, it increases the size of a commonly used type in all + cases, even when the code has no mention of `read_exact`. + +The final alternative is `read_full`, which returns the number of bytes +read (`Result`) instead of failing. This means that every caller +has to check the return value against the size of the passed buffer, and +some are going to forget (or misimplement) the check. It also prevents +some optimizations (like the early return in case there will never be +enough data). There are, however, valid use cases for this alternative; +for instance, reading a file in fixed-size chunks, where the last chunk +(and only the last chunk) can be shorter. I believe this should be +discussed as a separate proposal; its pros and cons are distinct enough +from this proposal to merit its own arguments. + +I believe that the case for `read_full` is weaker than `read_exact`, for +the following reasons: + +* While `read_exact` needs an extra variant in `ErrorKind`, `read_full` + has no new error cases. This means that implementing it yourself is + easy, and multiple implementations have no drawbacks other than code + duplication. +* While `read_exact` can be optimized with an early return in cases + where the reader knows its total size (for instance, reading from a + compressed file where the uncompressed size was given in a header), + `read_full` has to always write to the output buffer, so there's not + much to gain over a generic looping implementation calling `read`. + From 4df689974c0b40207790d238254b2107baae5254 Mon Sep 17 00:00:00 2001 From: Cesar Eduardo Barros Date: Tue, 14 Jul 2015 08:35:59 -0300 Subject: [PATCH 08/10] Detailed semantics, and explanations about EINTR and the read pointer --- text/0000-read-all.md | 82 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 82 insertions(+) diff --git a/text/0000-read-all.md b/text/0000-read-all.md index 6f191524fe6..83b1f4aa4b7 100644 --- a/text/0000-read-all.md +++ b/text/0000-read-all.md @@ -91,6 +91,88 @@ fn read_exact(&mut self, buf: &mut [u8]) -> Result<()> { } ``` +The detailed semantics of `read_exact` are as follows: `read_exact` +reads exactly the number of bytes needed to completely fill its `buf` +parameter. If that's not possible due to an "end of file" condition +(that is, the `read` method would return 0 even when passed a buffer +with at least one byte), it returns an `ErrorKind::UnexpectedEOF` error. + +On success, the read pointer is advanced by the number of bytes read, as +if the `read` method had been called repeatedly to fill the buffer. On +any failure (including an `ErrorKind::UnexpectedEOF`), the read pointer +might have been advanced by any number between zero and the number of +bytes requested (inclusive), and the contents of its `buf` parameter +should be treated as garbage (any part of it might or might not have +been overwritten by unspecified data). + +Even if the failure was an `ErrorKind::UnexpectedEOF`, the read pointer +might have been advanced by a number of bytes less than the number of +bytes which could be read before reaching an "end of file" condition. + +The `read_exact` method will never return an `ErrorKind::Interrupted` +error, similar to the `read_to_end` method. + +Similar to the `read` method, no guarantees are provided about the +contents of `buf` when this function is called; implementations cannot +rely on any property of the contents of `buf` being true. It is +recommended that implementations only write data to `buf` instead of +reading its contents. + +# About ErrorKind::Interrupted + +Whether or not `read_exact` can return an `ErrorKind::Interrupted` error +is orthogonal to its semantics. One could imagine an alternative design +where `read_exact` could return an `ErrorKind::Interrupted` error. + +The reason `read_exact` should deal with `ErrorKind::Interrupted` itself +is its non-idempotence. On failure, it might have already partially +advanced its read pointer an unknown number of bytes, which means it +can't be easily retried after an `ErrorKind::Interrupted` error. + +One could argue that it could return an `ErrorKind::Interrupted` error +if it's interrupted before the read pointer is advanced. But that +introduces a non-orthogonality in the design, where it might either +return or retry depending on whether it was interrupted at the beginning +or in the middle. Therefore, the cleanest semantics is to always retry. + +There's precedent for this choice in the `read_to_end` method. Users who +need finer control should use the `read` method directly. + +# About the read pointer + +This RFC proposes a `read_exact` function where the read pointer +(conceptually, what would be returned by `Seek::seek` if the stream was +seekable) is unspecified on failure: it might not have advanced at all, +have advanced in full, or advanced partially. + +Two possible alternatives could be considered: never advance the read +pointer on failure, or always advance the read pointer to the "point of +error" (in the case of `ErrorKind::UnexpectedEOF`, to the end of the +stream). + +Never advancing the read pointer on failure would make it impossible to +have a default implementation (which calls `read` in a loop), unless the +stream was seekable. It would also impose extra costs (like creating a +temporary buffer) to allow "seeking back" for non-seekable streams. + +Always advancing the read pointer to the end on failure is possible; it +happens without any extra code in the default implementation. However, +it can introduce extra costs in optimized implementations. For instance, +the implementation given above for `&[u8]` would need a few more +instructions in the error case. Some implementations (for instance, +reading from a compressed stream) might have a larger extra cost. + +The utility of always advancing the read pointer to the end is +questionable; for non-seekable streams, there's not much that can be +done on an "end of file" condition, so most users would discard the +stream in both an "end of file" and an `ErrorKind::UnexpectedEOF` +situation. For seekable streams, it's easy to seek back, but most users +would treat an `ErrorKind::UnexpectedEOF` as a "corrupted file" and +discard the stream anyways. + +Users who need finer control should use the `read` method directly, or +when available use the `Seek` trait. + # Naming It's unfortunate that `write_all` used `WriteZero` for its `ErrorKind`; From 4732352c41a36a485d7b4e59f4dfea596e8f27b3 Mon Sep 17 00:00:00 2001 From: Cesar Eduardo Barros Date: Thu, 16 Jul 2015 20:20:26 -0300 Subject: [PATCH 09/10] Use .is_empty() instead of .len() Should make no difference in speed, but is slightly cleaner. --- text/0000-read-all.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/text/0000-read-all.md b/text/0000-read-all.md index 83b1f4aa4b7..c242535a275 100644 --- a/text/0000-read-all.md +++ b/text/0000-read-all.md @@ -60,7 +60,7 @@ Aditionally, a default implementation of this method is provided: ``` rust fn read_exact(&mut self, mut buf: &mut [u8]) -> Result<()> { - while buf.len() > 0 { + while !buf.is_empty() { match self.read(buf) { Ok(0) => break, Ok(n) => { let tmp = buf; buf = &mut tmp[n..]; } @@ -68,7 +68,7 @@ fn read_exact(&mut self, mut buf: &mut [u8]) -> Result<()> { Err(e) => return Err(e), } } - if buf.len() > 0 { + if !buf.is_empty() { Err(Error::new(ErrorKind::UnexpectedEOF, "failed to fill whole buffer")) } else { Ok(()) From 7cd058d07ec802983eb6c5b999b7744cc33cbeeb Mon Sep 17 00:00:00 2001 From: Cesar Eduardo Barros Date: Sun, 19 Jul 2015 23:30:45 -0300 Subject: [PATCH 10/10] Explanations about the buffer contents --- text/0000-read-all.md | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/text/0000-read-all.md b/text/0000-read-all.md index c242535a275..d2c841813a4 100644 --- a/text/0000-read-all.md +++ b/text/0000-read-all.md @@ -173,6 +173,30 @@ discard the stream anyways. Users who need finer control should use the `read` method directly, or when available use the `Seek` trait. +# About the buffer contents + +This RFC proposes that the contents of the output buffer be undefined on +an error return. It might be untouched, partially overwritten, or +completely overwritten (even if less bytes could be read; for instance, +this method might in theory use it as a scratch space). + +Two possible alternatives could be considered: do not touch it on +failure, or overwrite it with valid data as much as possible. + +Never touching the output buffer on failure would make it much more +expensive for the default implementation (which calls `read` in a loop), +since it would have to read into a temporary buffer and copy to the +output buffer on success. Any implementation which cannot do an early +return for all failure cases would have similar extra costs. + +Overwriting as much as possible with valid data makes some sense; it +happens without any extra cost in the default implementation. However, +for optimized implementations this extra work is useless; since the +caller can't know how much is valid data and how much is garbage, it +can't make use of the valid data. + +Users who need finer control should use the `read` method directly. + # Naming It's unfortunate that `write_all` used `WriteZero` for its `ErrorKind`;