Rename TryFrom's associated type and implement str::parse using TryFrom.
Per discussion on the tracking issue, naming `TryFrom`'s associated type `Error` is generally more consistent with similar traits in the Rust ecosystem, and what people seem to assume it should be called. It also helps disambiguate from `Result::Err`, the most common "Err".
See https://github.com/rust-lang/rust/issues/33417#issuecomment-269108968.
`TryFrom<&str>` and `FromStr` are equivalent, so have the latter provide the former to ensure that. Using `TryFrom` in the implementation of `str::parse` means types that implement either trait can use it. When we're ready to stabilize `TryFrom`, we should update `FromStr` to
suggest implementing `TryFrom<&str>` instead for new code.
See https://github.com/rust-lang/rust/issues/33417#issuecomment-277175994
and https://github.com/rust-lang/rust/issues/33417#issuecomment-277253827.
Refs #33417.
Their relationship is:
* `resume_from = error_len.map(|l| l + valid_up_to)`
* error_len is always one of None, Some(1), Some(2), or Some(3).
When I started using resume_from I almost always ended up subtracting
valid_up_to to obtain error_len.
Therefore the latter is what should be provided in the first place.
UTF-8 validation: Compute block end upfront
Simplify the conditional used for ensuring that the whole word loop is
only used if there are at least two whole words left to read.
This makes the function slightly smaller and simpler, a 0-5% reduction
in runtime for various test cases.
Use more specific panic message for &str slicing errors
Separate out of bounds errors from character boundary errors, and print
more details for character boundary errors.
It reports the first error it finds in:
1. begin out of bounds
2. end out of bounds
3. begin <= end violated
3. begin not char boundary
5. end not char boundary.
Example:
&"abcαβγ"[..4]
thread 'str::test_slice_fail_boundary_1' panicked at 'byte index 4 is not
a char boundary; it is inside 'α' (bytes 3..5) of `abcαβγ`'
Fixes#38052
Separate out of bounds errors from character boundary errors, and print
more details for character boundary errors.
Example:
&"abcαβγ"[..4]
thread 'str::test_slice_fail_boundary_1' panicked at 'byte index 4 is not
a char boundary; it is inside `α` (bytes 3..5) of `abcαβγ`'
Simplify the conditional used for ensuring that the whole word loop is
only used if there are at least two whole words left to read.
This makes the function slightly smaller and simpler, a 0-5% reduction
in runtime for various test cases.
Improve .chars().count()
Use a simpler loop to count the `char` of a string: count the
number of non-continuation bytes. Use `count += <conditional>` which the
compiler understands well and can apply loop optimizations to.
benchmark descriptions and results for two configurations:
- ascii: ascii text
- cy: cyrillic text
- jp: japanese text
- words ascii: counting each split_whitespace item from the ascii text
- words jp: counting each split_whitespace item from the jp text
```
x86-64 rustc -Copt-level=3
name orig_ ns/iter cmov_ ns/iter diff ns/iter diff %
count_ascii 1,453 (1755 MB/s) 1,398 (1824 MB/s) -55 -3.79%
count_cy 5,990 (856 MB/s) 2,545 (2016 MB/s) -3,445 -57.51%
count_jp 3,075 (1169 MB/s) 1,772 (2029 MB/s) -1,303 -42.37%
count_words_ascii 4,157 (521 MB/s) 1,797 (1205 MB/s) -2,360 -56.77%
count_words_jp 3,337 (1071 MB/s) 1,772 (2018 MB/s) -1,565 -46.90%
x86-64 rustc -Ctarget-feature=+avx -Copt-level=3
name orig_ ns/iter cmov_ ns/iter diff ns/iter diff %
count_ascii 1,444 (1766 MB/s) 763 (3343 MB/s) -681 -47.16%
count_cy 5,871 (874 MB/s) 1,527 (3360 MB/s) -4,344 -73.99%
count_jp 2,874 (1251 MB/s) 1,073 (3351 MB/s) -1,801 -62.67%
count_words_ascii 4,131 (524 MB/s) 1,871 (1157 MB/s) -2,260 -54.71%
count_words_jp 3,253 (1099 MB/s) 1,331 (2686 MB/s) -1,922 -59.08%
```
I briefly explored a more involved blocked algorithm (looking at 8 or more bytes at a time),
but the code in this PR was always winning `count_words_ascii` in particular (counting
many small strings); this solution is an improvement without tradeoffs.
Use a simpler loop to count the `char` of a string: count the
number of non-continuation bytes. Use `count += <conditional>` which the
compiler understands well and can apply loop optimizations to.
Use `len` instead of `size_hint` where appropiate
This makes it clearer that we're not just looking for a lower bound but
rather know that the iterator is an `ExactSizeIterator`.
* Remove the deprecated `CharRange` type which was forgotten to be removed
awhile back.
* Stabilize the `os::$platform::raw::pthread_t` type which was intended to be
stabilized as part of #32804
std: Stabilize APIs for the 1.9 release
This commit applies all stabilizations, renamings, and deprecations that the
library team has decided on for the upcoming 1.9 release. All tracking issues
have gone through a cycle-long "final comment period" and the specific APIs
stabilized/deprecated are:
Stable
* `std::panic`
* `std::panic::catch_unwind` (renamed from `recover`)
* `std::panic::resume_unwind` (renamed from `propagate`)
* `std::panic::AssertUnwindSafe` (renamed from `AssertRecoverSafe`)
* `std::panic::UnwindSafe` (renamed from `RecoverSafe`)
* `str::is_char_boundary`
* `<*const T>::as_ref`
* `<*mut T>::as_ref`
* `<*mut T>::as_mut`
* `AsciiExt::make_ascii_uppercase`
* `AsciiExt::make_ascii_lowercase`
* `char::decode_utf16`
* `char::DecodeUtf16`
* `char::DecodeUtf16Error`
* `char::DecodeUtf16Error::unpaired_surrogate`
* `BTreeSet::take`
* `BTreeSet::replace`
* `BTreeSet::get`
* `HashSet::take`
* `HashSet::replace`
* `HashSet::get`
* `OsString::with_capacity`
* `OsString::clear`
* `OsString::capacity`
* `OsString::reserve`
* `OsString::reserve_exact`
* `OsStr::is_empty`
* `OsStr::len`
* `std::os::unix::thread`
* `RawPthread`
* `JoinHandleExt`
* `JoinHandleExt::as_pthread_t`
* `JoinHandleExt::into_pthread_t`
* `HashSet::hasher`
* `HashMap::hasher`
* `CommandExt::exec`
* `File::try_clone`
* `SocketAddr::set_ip`
* `SocketAddr::set_port`
* `SocketAddrV4::set_ip`
* `SocketAddrV4::set_port`
* `SocketAddrV6::set_ip`
* `SocketAddrV6::set_port`
* `SocketAddrV6::set_flowinfo`
* `SocketAddrV6::set_scope_id`
* `<[T]>::copy_from_slice`
* `ptr::read_volatile`
* `ptr::write_volatile`
* The `#[deprecated]` attribute
* `OpenOptions::create_new`
Deprecated
* `std::raw::Slice` - use raw parts of `slice` module instead
* `std::raw::Repr` - use raw parts of `slice` module instead
* `str::char_range_at` - use slicing plus `chars()` plus `len_utf8`
* `str::char_range_at_reverse` - use slicing plus `chars().rev()` plus `len_utf8`
* `str::char_at` - use slicing plus `chars()`
* `str::char_at_reverse` - use slicing plus `chars().rev()`
* `str::slice_shift_char` - use `chars()` plus `Chars::as_str`
* `CommandExt::session_leader` - use `before_exec` instead.
Closes#27719
cc #27751 (deprecating the `Slice` bits)
Closes#27754Closes#27780Closes#27809Closes#27811Closes#27830Closes#28050Closes#29453Closes#29791Closes#29935Closes#30014Closes#30752Closes#31262
cc #31398 (still need to deal with `before_exec`)
Closes#31405Closes#31572Closes#31755Closes#31756
This commit applies all stabilizations, renamings, and deprecations that the
library team has decided on for the upcoming 1.9 release. All tracking issues
have gone through a cycle-long "final comment period" and the specific APIs
stabilized/deprecated are:
Stable
* `std::panic`
* `std::panic::catch_unwind` (renamed from `recover`)
* `std::panic::resume_unwind` (renamed from `propagate`)
* `std::panic::AssertUnwindSafe` (renamed from `AssertRecoverSafe`)
* `std::panic::UnwindSafe` (renamed from `RecoverSafe`)
* `str::is_char_boundary`
* `<*const T>::as_ref`
* `<*mut T>::as_ref`
* `<*mut T>::as_mut`
* `AsciiExt::make_ascii_uppercase`
* `AsciiExt::make_ascii_lowercase`
* `char::decode_utf16`
* `char::DecodeUtf16`
* `char::DecodeUtf16Error`
* `char::DecodeUtf16Error::unpaired_surrogate`
* `BTreeSet::take`
* `BTreeSet::replace`
* `BTreeSet::get`
* `HashSet::take`
* `HashSet::replace`
* `HashSet::get`
* `OsString::with_capacity`
* `OsString::clear`
* `OsString::capacity`
* `OsString::reserve`
* `OsString::reserve_exact`
* `OsStr::is_empty`
* `OsStr::len`
* `std::os::unix::thread`
* `RawPthread`
* `JoinHandleExt`
* `JoinHandleExt::as_pthread_t`
* `JoinHandleExt::into_pthread_t`
* `HashSet::hasher`
* `HashMap::hasher`
* `CommandExt::exec`
* `File::try_clone`
* `SocketAddr::set_ip`
* `SocketAddr::set_port`
* `SocketAddrV4::set_ip`
* `SocketAddrV4::set_port`
* `SocketAddrV6::set_ip`
* `SocketAddrV6::set_port`
* `SocketAddrV6::set_flowinfo`
* `SocketAddrV6::set_scope_id`
* `<[T]>::copy_from_slice`
* `ptr::read_volatile`
* `ptr::write_volatile`
* The `#[deprecated]` attribute
* `OpenOptions::create_new`
Deprecated
* `std::raw::Slice` - use raw parts of `slice` module instead
* `std::raw::Repr` - use raw parts of `slice` module instead
* `str::char_range_at` - use slicing plus `chars()` plus `len_utf8`
* `str::char_range_at_reverse` - use slicing plus `chars().rev()` plus `len_utf8`
* `str::char_at` - use slicing plus `chars()`
* `str::char_at_reverse` - use slicing plus `chars().rev()`
* `str::slice_shift_char` - use `chars()` plus `Chars::as_str`
* `CommandExt::session_leader` - use `before_exec` instead.
Closes#27719
cc #27751 (deprecating the `Slice` bits)
Closes#27754Closes#27780Closes#27809Closes#27811Closes#27830Closes#28050Closes#29453Closes#29791Closes#29935Closes#30014Closes#30752Closes#31262
cc #31398 (still need to deal with `before_exec`)
Closes#31405Closes#31572Closes#31755Closes#31756
Where T is a type that can be compared for equality bytewise, we can use
memcmp. We can also use memcmp for PartialOrd, Ord for [u8] and by
extension &str.
This is an improvement for example for the comparison [u8] == [u8] that
used to emit a loop that compared the slices byte by byte.
One worry here could be that this introduces function calls to memcmp
in contexts where it should really inline the comparison or even
optimize it out, but llvm takes care of recognizing memcmp specifically.
Hardcode accepting 0 as a valid str char boundary
If we check explicitly for index == 0, that removes the need to read the
byte at index 0, so it avoids a trip to the string's memory, and it
optimizes out the slicing index' bounds check whenever it is (a constant) zero.
Index 0 must be a valid char boundary (invariant of str that it contains
valid UTF-8 data).
If we check explicitly for index == 0, that removes the need to read the
byte at index 0, so it avoids a trip to the string's memory, and it
optimizes out the slicing index' bounds check whenever it is zero.
With this change, the following examples all change from having a read of
the byte at 0 and a branch to possibly panicing, to having the bounds
checking optimized away.
```rust
pub fn split(s: &str) -> (&str, &str) {
s.split_at(0)
}
pub fn both(s: &str) -> &str {
&s[0..s.len()]
}
pub fn first(s: &str) -> &str {
&s[..0]
}
pub fn last(s: &str) -> &str {
&s[0..]
}
```
Automated conversion using the untry tool [1] and the following command:
```
$ find -name '*.rs' -type f | xargs untry
```
at the root of the Rust repo.
[1]: https://github.com/japaric/untry