Skip to content

Commit 79b4ce0

Browse files
committed
collections: Stabilize String
# Rationale When dealing with strings, many functions deal with either a `char` (unicode codepoint) or a byte (utf-8 encoding related). There is often an inconsistent way in which methods are referred to as to whether they contain "byte", "char", or nothing in their name. There are also issues open to rename *all* methods to reflect that they operate on utf8 encodings or bytes (e.g. utf8_len() or byte_len()). The current state of String seems to largely be what is desired, so this PR proposes the following rationale for methods dealing with bytes or characters: > When constructing a string, the input encoding *must* be mentioned (e.g. > from_utf8). This makes it clear what exactly the input type is expected to be > in terms of encoding. > > When a method operates on anything related to an *index* within the string > such as length, capacity, position, etc, the method *implicitly* operates on > bytes. It is an understood fact that String is a utf-8 encoded string, and > burdening all methods with "bytes" would be redundant. > > When a method operates on the *contents* of a string, such as push() or pop(), > then "char" is the default type. A String can loosely be thought of as being a > collection of unicode codepoints, but not all collection-related operations > make sense because some can be woefully inefficient. # Method stabilization The following methods have been marked #[stable] * The String type itself * String::new * String::with_capacity * String::from_utf16_lossy * String::into_bytes * String::as_bytes * String::len * String::clear * String::as_slice The following methods have been marked #[unstable] * String::from_utf8 - The error type in the returned `Result` may change to provide a nicer message when it's `unwrap()`'d * String::from_utf8_lossy - The returned `MaybeOwned` type still needs stabilization * String::from_utf16 - The return type may change to become a `Result` which includes more contextual information like where the error occurred. * String::from_chars - This is equivalent to iter().collect(), but currently not as ergonomic. * String::from_char - This method is the equivalent of Vec::from_elem, and has been marked #[unstable] becuase it can be seen as a duplicate of iterator-based functionality as well as possibly being renamed. * String::push_str - This *can* be emulated with .extend(foo.chars()), but is less efficient because of decoding/encoding. Due to the desire to minimize API surface this may be able to be removed in the future for something possibly generic with no loss in performance. * String::grow - This is a duplicate of iterator-based functionality, which may become more ergonomic in the future. * String::capacity - This function was just added. * String::push - This function was just added. * String::pop - This function was just added. * String::truncate - The failure conventions around String methods and byte indices isn't totally clear at this time, so the failure semantics and return value of this method are subject to change. * String::as_mut_vec - the naming of this method may change. * string::raw::* - these functions are all waiting on [an RFC][2] [2]: rust-lang/rfcs#240 The following method have been marked #[experimental] * String::from_str - This function only exists as it's more efficient than to_string(), but having a less ergonomic function for performance reasons isn't the greatest reason to keep it around. Like Vec::push_all, this has been marked experimental for now. The following methods have been #[deprecated] * String::append - This method has been deprecated to remain consistent with the deprecation of Vec::append. While convenient, it is one of the only functional-style apis on String, and requires more though as to whether it belongs as a first-class method or now (and how it relates to other collections). * String::from_byte - This is fairly rare functionality and can be emulated with str::from_utf8 plus an assert plus a call to to_string(). Additionally, String::from_char could possibly be used. * String::byte_capacity - Renamed to String::capacity due to the rationale above. * String::push_char - Renamed to String::push due to the rationale above. * String::pop_char - Renamed to String::pop due to the rationale above. * String::push_bytes - There are a number of `unsafe` functions on the `String` type which allow bypassing utf-8 checks. These have all been deprecated in favor of calling `.as_mut_vec()` and then operating directly on the vector returned. These methods were deprecated because naming them with relation to other methods was difficult to rationalize and it's arguably more composable to call .as_mut_vec(). * String::as_mut_bytes - See push_bytes * String::push_byte - See push_bytes * String::pop_byte - See push_bytes * String::shift_byte - See push_bytes # Reservation methods This commit does not yet touch the methods for reserving bytes. The methods on Vec have also not yet been modified. These methods are discussed in the upcoming [Collections reform RFC][1] [1]: https://github.com/aturon/rfcs/blob/collections-conventions/active/0000-collections-conventions.md#implicit-growth
1 parent 3907a13 commit 79b4ce0

File tree

1 file changed

+77
-9
lines changed

1 file changed

+77
-9
lines changed

src/libcollections/string.rs

Lines changed: 77 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ use vec::Vec;
3131

3232
/// A growable string stored as a UTF-8 encoded buffer.
3333
#[deriving(Clone, PartialEq, PartialOrd, Eq, Ord)]
34+
#[stable]
3435
pub struct String {
3536
vec: Vec<u8>,
3637
}
@@ -44,6 +45,7 @@ impl String {
4445
/// let mut s = String::new();
4546
/// ```
4647
#[inline]
48+
#[stable]
4749
pub fn new() -> String {
4850
String {
4951
vec: Vec::new(),
@@ -60,6 +62,7 @@ impl String {
6062
/// let mut s = String::with_capacity(10);
6163
/// ```
6264
#[inline]
65+
#[stable]
6366
pub fn with_capacity(capacity: uint) -> String {
6467
String {
6568
vec: Vec::with_capacity(capacity),
@@ -75,6 +78,7 @@ impl String {
7578
/// assert_eq!(s.as_slice(), "hello");
7679
/// ```
7780
#[inline]
81+
#[experimental = "needs investigation to see if to_string() can match perf"]
7882
pub fn from_str(string: &str) -> String {
7983
String { vec: string.as_bytes().to_vec() }
8084
}
@@ -111,6 +115,7 @@ impl String {
111115
/// assert_eq!(s, Err(vec![240, 144, 128]));
112116
/// ```
113117
#[inline]
118+
#[unstable = "error type may change"]
114119
pub fn from_utf8(vec: Vec<u8>) -> Result<String, Vec<u8>> {
115120
if str::is_utf8(vec.as_slice()) {
116121
Ok(String { vec: vec })
@@ -129,6 +134,7 @@ impl String {
129134
/// let output = String::from_utf8_lossy(input);
130135
/// assert_eq!(output.as_slice(), "Hello \uFFFDWorld");
131136
/// ```
137+
#[unstable = "return type may change"]
132138
pub fn from_utf8_lossy<'a>(v: &'a [u8]) -> MaybeOwned<'a> {
133139
if str::is_utf8(v) {
134140
return MaybeOwnedSlice(unsafe { mem::transmute(v) })
@@ -260,6 +266,7 @@ impl String {
260266
/// v[4] = 0xD800;
261267
/// assert_eq!(String::from_utf16(v), None);
262268
/// ```
269+
#[unstable = "error value in return may change"]
263270
pub fn from_utf16(v: &[u16]) -> Option<String> {
264271
let mut s = String::with_capacity(v.len() / 2);
265272
for c in str::utf16_items(v) {
@@ -284,6 +291,7 @@ impl String {
284291
/// assert_eq!(String::from_utf16_lossy(v),
285292
/// "𝄞mus\uFFFDic\uFFFD".to_string());
286293
/// ```
294+
#[stable]
287295
pub fn from_utf16_lossy(v: &[u16]) -> String {
288296
str::utf16_items(v).map(|c| c.to_char_lossy()).collect()
289297
}
@@ -298,6 +306,7 @@ impl String {
298306
/// assert_eq!(s.as_slice(), "hello");
299307
/// ```
300308
#[inline]
309+
#[unstable = "may be removed in favor of .collect()"]
301310
pub fn from_chars(chs: &[char]) -> String {
302311
chs.iter().map(|c| *c).collect()
303312
}
@@ -312,6 +321,7 @@ impl String {
312321
/// assert_eq!(bytes, vec![104, 101, 108, 108, 111]);
313322
/// ```
314323
#[inline]
324+
#[stable]
315325
pub fn into_bytes(self) -> Vec<u8> {
316326
self.vec
317327
}
@@ -329,6 +339,7 @@ impl String {
329339
/// assert_eq!(big.as_slice(), "hello world!");
330340
/// ```
331341
#[inline]
342+
#[deprecated = "use .push_str() instead"]
332343
pub fn append(mut self, second: &str) -> String {
333344
self.push_str(second);
334345
self
@@ -343,6 +354,8 @@ impl String {
343354
/// assert_eq!(s.as_slice(), "aaaaa");
344355
/// ```
345356
#[inline]
357+
#[unstable = "may be replaced with iterators, questionable usability, and \
358+
the name may change"]
346359
pub fn from_char(length: uint, ch: char) -> String {
347360
if length == 0 {
348361
return String::new()
@@ -370,6 +383,7 @@ impl String {
370383
/// let s = String::from_byte(104);
371384
/// assert_eq!(s.as_slice(), "h");
372385
/// ```
386+
#[deprecated = "use str::from_utf8 with a slice of one byte instead"]
373387
pub fn from_byte(b: u8) -> String {
374388
assert!(b < 128u8);
375389
String::from_char(1, b as char)
@@ -385,6 +399,7 @@ impl String {
385399
/// assert_eq!(s.as_slice(), "foobar");
386400
/// ```
387401
#[inline]
402+
#[unstable = "extra variants of `push`, could possibly be based on iterators"]
388403
pub fn push_str(&mut self, string: &str) {
389404
self.vec.push_all(string.as_bytes())
390405
}
@@ -399,6 +414,7 @@ impl String {
399414
/// assert_eq!(s.as_slice(), "fooZZZZZ");
400415
/// ```
401416
#[inline]
417+
#[unstable = "duplicate of iterator-based functionality"]
402418
pub fn grow(&mut self, count: uint, ch: char) {
403419
for _ in range(0, count) {
404420
self.push_char(ch)
@@ -414,10 +430,25 @@ impl String {
414430
/// assert!(s.byte_capacity() >= 10);
415431
/// ```
416432
#[inline]
433+
#[deprecated = "renamed to .capacity()"]
417434
pub fn byte_capacity(&self) -> uint {
418435
self.vec.capacity()
419436
}
420437

438+
/// Returns the number of bytes that this string buffer can hold without reallocating.
439+
///
440+
/// # Example
441+
///
442+
/// ```
443+
/// let s = String::with_capacity(10);
444+
/// assert!(s.byte_capacity() >= 10);
445+
/// ```
446+
#[inline]
447+
#[unstable = "just implemented, needs to prove itself"]
448+
pub fn capacity(&self) -> uint {
449+
self.vec.capacity()
450+
}
451+
421452
/// Reserves capacity for at least `extra` additional bytes in this string buffer.
422453
///
423454
/// # Example
@@ -477,19 +508,27 @@ impl String {
477508
self.vec.shrink_to_fit()
478509
}
479510

511+
/// Deprecated, use .push() instead.
512+
#[inline]
513+
#[deprecated = "renamed to .push()"]
514+
pub fn push_char(&mut self, ch: char) {
515+
self.push(ch)
516+
}
517+
480518
/// Adds the given character to the end of the string.
481519
///
482520
/// # Example
483521
///
484522
/// ```
485523
/// let mut s = String::from_str("abc");
486-
/// s.push_char('1');
487-
/// s.push_char('2');
488-
/// s.push_char('3');
524+
/// s.push('1');
525+
/// s.push('2');
526+
/// s.push('3');
489527
/// assert_eq!(s.as_slice(), "abc123");
490528
/// ```
491529
#[inline]
492-
pub fn push_char(&mut self, ch: char) {
530+
#[stable = "function just renamed from push_char"]
531+
pub fn push(&mut self, ch: char) {
493532
let cur_len = self.len();
494533
// This may use up to 4 bytes.
495534
self.vec.reserve_additional(4);
@@ -520,6 +559,7 @@ impl String {
520559
/// assert_eq!(s.as_slice(), "hello");
521560
/// ```
522561
#[inline]
562+
#[deprecated = "call .as_mut_vec() and push onto that"]
523563
pub unsafe fn push_bytes(&mut self, bytes: &[u8]) {
524564
self.vec.push_all(bytes)
525565
}
@@ -534,6 +574,7 @@ impl String {
534574
/// assert_eq!(s.as_bytes(), b);
535575
/// ```
536576
#[inline]
577+
#[stable]
537578
pub fn as_bytes<'a>(&'a self) -> &'a [u8] {
538579
self.vec.as_slice()
539580
}
@@ -557,6 +598,7 @@ impl String {
557598
/// assert_eq!(s.as_slice(), "h3ll0")
558599
/// ```
559600
#[inline]
601+
#[deprecated = "call .as_mut_vec().as_slice() instead"]
560602
pub unsafe fn as_mut_bytes<'a>(&'a mut self) -> &'a mut [u8] {
561603
self.vec.as_mut_slice()
562604
}
@@ -575,6 +617,7 @@ impl String {
575617
/// assert_eq!(s.as_slice(), "he");
576618
/// ```
577619
#[inline]
620+
#[unstable = "the failure conventions for strings are under development"]
578621
pub fn truncate(&mut self, len: uint) {
579622
assert!(self.as_slice().is_char_boundary(len));
580623
self.vec.truncate(len)
@@ -595,6 +638,7 @@ impl String {
595638
/// assert_eq!(s.as_slice(), "hello");
596639
/// ```
597640
#[inline]
641+
#[deprecated = "call .as_mut_vec().push() instead"]
598642
pub unsafe fn push_byte(&mut self, byte: u8) {
599643
self.vec.push(byte)
600644
}
@@ -617,6 +661,7 @@ impl String {
617661
/// }
618662
/// ```
619663
#[inline]
664+
#[deprecated = "call .as_mut_vec().pop() instead"]
620665
pub unsafe fn pop_byte(&mut self) -> Option<u8> {
621666
let len = self.len();
622667
if len == 0 {
@@ -628,20 +673,26 @@ impl String {
628673
Some(byte)
629674
}
630675

676+
/// Deprecated. Renamed to `pop`.
677+
#[inline]
678+
#[deprecated = "renamed to .pop()"]
679+
pub fn pop_char(&mut self) -> Option<char> { self.pop() }
680+
631681
/// Removes the last character from the string buffer and returns it.
632682
/// Returns `None` if this string buffer is empty.
633683
///
634684
/// # Example
635685
///
636686
/// ```
637687
/// let mut s = String::from_str("foo");
638-
/// assert_eq!(s.pop_char(), Some('o'));
639-
/// assert_eq!(s.pop_char(), Some('o'));
640-
/// assert_eq!(s.pop_char(), Some('f'));
641-
/// assert_eq!(s.pop_char(), None);
688+
/// assert_eq!(s.pop(), Some('o'));
689+
/// assert_eq!(s.pop(), Some('o'));
690+
/// assert_eq!(s.pop(), Some('f'));
691+
/// assert_eq!(s.pop(), None);
642692
/// ```
643693
#[inline]
644-
pub fn pop_char(&mut self) -> Option<char> {
694+
#[unstable = "this function was just renamed from pop_char"]
695+
pub fn pop(&mut self) -> Option<char> {
645696
let len = self.len();
646697
if len == 0 {
647698
return None
@@ -671,6 +722,7 @@ impl String {
671722
/// assert_eq!(s.shift_byte(), None);
672723
/// }
673724
/// ```
725+
#[deprecated = "call .as_mut_rev().remove(0)"]
674726
pub unsafe fn shift_byte(&mut self) -> Option<u8> {
675727
self.vec.remove(0)
676728
}
@@ -722,25 +774,31 @@ impl String {
722774
/// }
723775
/// assert_eq!(s.as_slice(), "olleh");
724776
/// ```
777+
#[unstable = "the name of this method may be changed"]
725778
pub unsafe fn as_mut_vec<'a>(&'a mut self) -> &'a mut Vec<u8> {
726779
&mut self.vec
727780
}
728781
}
729782

783+
#[experimental = "collection traits will probably be removed"]
730784
impl Collection for String {
731785
#[inline]
786+
#[stable]
732787
fn len(&self) -> uint {
733788
self.vec.len()
734789
}
735790
}
736791

792+
#[experimental = "collection traits will probably be removed"]
737793
impl Mutable for String {
738794
#[inline]
795+
#[stable]
739796
fn clear(&mut self) {
740797
self.vec.clear()
741798
}
742799
}
743800

801+
#[experimental = "waiting on FromIterator stabilization"]
744802
impl FromIterator<char> for String {
745803
fn from_iter<I:Iterator<char>>(iterator: I) -> String {
746804
let mut buf = String::new();
@@ -749,6 +807,7 @@ impl FromIterator<char> for String {
749807
}
750808
}
751809

810+
#[experimental = "waiting on Extendable stabilization"]
752811
impl Extendable<char> for String {
753812
fn extend<I:Iterator<char>>(&mut self, mut iterator: I) {
754813
for ch in iterator {
@@ -757,48 +816,56 @@ impl Extendable<char> for String {
757816
}
758817
}
759818

819+
#[experimental = "waiting on Str stabilization"]
760820
impl Str for String {
761821
#[inline]
822+
#[stable]
762823
fn as_slice<'a>(&'a self) -> &'a str {
763824
unsafe {
764825
mem::transmute(self.vec.as_slice())
765826
}
766827
}
767828
}
768829

830+
#[experimental = "waiting on StrAllocating stabilization"]
769831
impl StrAllocating for String {
770832
#[inline]
771833
fn into_string(self) -> String {
772834
self
773835
}
774836
}
775837

838+
#[stable]
776839
impl Default for String {
777840
fn default() -> String {
778841
String::new()
779842
}
780843
}
781844

845+
#[experimental = "waiting on Show stabilization"]
782846
impl fmt::Show for String {
783847
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
784848
self.as_slice().fmt(f)
785849
}
786850
}
787851

852+
#[experimental = "waiting on Hash stabilization"]
788853
impl<H: hash::Writer> hash::Hash<H> for String {
789854
#[inline]
790855
fn hash(&self, hasher: &mut H) {
791856
self.as_slice().hash(hasher)
792857
}
793858
}
794859

860+
#[experimental = "waiting on Equiv stabilization"]
795861
impl<'a, S: Str> Equiv<S> for String {
796862
#[inline]
797863
fn equiv(&self, other: &S) -> bool {
798864
self.as_slice() == other.as_slice()
799865
}
800866
}
801867

868+
#[experimental = "waiting on Add stabilization"]
802869
impl<S: Str> Add<S, String> for String {
803870
fn add(&self, other: &S) -> String {
804871
let mut s = String::from_str(self.as_slice());
@@ -808,6 +875,7 @@ impl<S: Str> Add<S, String> for String {
808875
}
809876

810877
/// Unsafe operations
878+
#[unstable = "waiting on raw module conventions"]
811879
pub mod raw {
812880
use core::mem;
813881
use core::ptr::RawPtr;

0 commit comments

Comments
 (0)