Commit Graph

473 Commits

Author SHA1 Message Date
Lzu Tao
fff822fead Migrate to numeric associated consts 2020-06-10 01:35:47 +00:00
Ralf Jung
b7f6a0b322 Rollup merge of #72773 - Rantanen:is_char_boundary-docs, r=joshtriplett
Fix is_char_boundary documentation

Given the "start _and/or end_" wording in the original, the way I understood it was that the `str::is_char_boundary` method would also return `true` for the last byte in a UTF-8 code point sequence. (Which would have meant that for a string consisting of nothing but 1 and 2 byte UTF-8 code point sequences, it would return nothing but `true`.)

In practice the method returns `true` only for the starting byte of each sequence and the end of the string: [Playground](https://play.rust-lang.org/?version=nightly&mode=debug&edition=2018&gist=e9f5fc4d6bf2f1bf57a75f3c9a180770)

I was also somewhat tempted to remove the _The start and end of the string are considered to be boundaries_, since that's implied by the first sentence, but I decided to avoid bikeshedding over it and left it as it was since it's not wrong in relation to how the method behaves.
2020-05-30 23:09:02 +02:00
Mikko Rantanen
66e9984938 Fix is_char_boundary documentation 2020-05-27 16:21:30 +03:00
Lzu Tao
2df69baa55 Stabilize str_strip feature 2020-05-22 15:29:47 +00:00
Pyry Kontio
46159b3610 split_inclusive: add tracking issue number (72360) 2020-05-20 04:22:37 +09:00
Eric Huss
ca61fd5636 Update pattern docs. 2020-04-19 17:19:12 -07:00
Mazdak Farrokhzad
4aeeb81db5 Rollup merge of #70588 - Coder-256:str-split-at-docs, r=Dylan-DPC
Fix incorrect documentation for `str::{split_at, split_at_mut}`

The documentation for each method currently states:

> Panics if `mid` is not on a UTF-8 code point boundary, or if it is beyond the last code point of the string slice.

However, this is not consistent with the real behavior, or that of the corresponding methods for `[T]` slices. A comment inside each of the `str` methods states:

> is_char_boundary checks that the index is in [0, .len()]

That is what I would expect the behavior to be, and in fact this seems to be the real behavior. For example ([playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=8e03dcc209d4dd176df2297523f9fee1)):

```rust
fn main() {
    // Prints ("abc", "") and doesn't panic
    println!("{:?}", "abc".split_at(3));
}
```

In this case, I would interpret "the last code point of the string slice" to mean the byte at index 2 in UTF-8. However, it is possible to pass an index of 3, which is definitely "beyond the last code point of the string slice".

I think that this is much clearer, but feel free to bikeshed.
2020-03-31 15:59:50 +02:00
Jacob Greenfield
fcab1f947b Fix incorrect documentation for str::{split_at, split_at_mut} 2020-03-30 15:48:52 -04:00
Nikhil Benesch
ac478f2f61 Optimize strip_prefix and strip_suffix with str patterns
Constructing a Searcher in strip_prefix and strip_suffix is
unnecessarily slow when the pattern is a fixed-length string. Add
strip_prefix and strip_suffix methods to the Pattern trait, and add
optimized implementations of these methods in the str implementation.
The old implementation is retained as the default for these methods.
2020-03-30 11:10:21 -04:00
Adam Perry
a7ab7b136e #[track_caller] on core::ops::{Index, IndexMut}. 2020-03-23 08:01:49 -07:00
LeSeulArtichaut
6ed4829c17 Make link to std::str active 2020-03-05 08:52:46 +01:00
ridiculousfish
9e41c4b682 Relax str::get_unchecked precondition to permit empty slicing
Prior to this commit, `str` documented that `get_unchecked` had
the precondition that "`begin` must come before `end`". This would appear
to prohibit empty slices (i.e. begin == end).

In practice, get_unchecked is called often with empty slices. Let's relax
the precondition so as to allow them.
2020-02-22 13:30:54 -08:00
bors
87e494c4cd Auto merge of #67330 - golddranks:split_inclusive, r=kodraus
Implement split_inclusive for slice and str

# Overview
* Implement `split_inclusive` for `slice` and `str` and `split_inclusive_mut` for `slice`
* `split_inclusive` is a substring/subslice splitting iterator that includes the matched part in the iterated substrings as a terminator.
* EDIT: The behaviour has now changed, as per @KodrAus 's input, to the same semantics with the `split_terminator` function. I updated the examples below.
* Two examples below:
```Rust
    let data = "\nMäry häd ä little lämb\nLittle lämb\n";
    let split: Vec<&str> = data.split_inclusive('\n').collect();
    assert_eq!(split, ["\n", "Märy häd ä little lämb\n", "Little lämb\n"]);
```

```Rust
    let uppercase_separated = "SheePSharKTurtlECaT";
    let mut first_char = true;
    let split: Vec<&str> = uppercase_separated.split_inclusive(|c: char| {
        let split = !first_char && c.is_uppercase();
        first_char = split;
        split
    }).collect();
    assert_eq!(split, ["SheeP", "SharK", "TurtlE", "CaT"]);
```

# Justification for the API
* I was surprised to find that stdlib currently only has splitting iterators that leave out the matched part. In my experience, wanting to leave a substring terminator as a part of the substring is a pretty common usecase.
* This API is strictly more expressive than the standard `split` API: it's easy to get the behaviour of `split` by mapping a subslicing operation that drops the terminator. On the other hand it's impossible to derive this behaviour from `split` without using hacky and brittle `unsafe` code. The normal way to achieve this functionality would be implementing the iterator yourself.
* Especially when dealing with mutable slices, the only way currently is to use `split_at_mut`. This API provides an ergonomic alternative that plays to the strengths of the iterating capabilities of Rust. (Using `split_at_mut` iteratively used to be a real pain before NLL, fortunately the situation is a bit better now.)

# Discussion items
* <s>Does it make sense to mimic `split_terminator` in that the final empty slice would be left off in case of the string/slice ending with a terminator? It might do, as this use case is naturally geared towards considering the matching part as a terminator instead of a separator.</s>
  * EDIT: The behaviour was changed to mimic `split_terminator`.
* Does it make sense to have `split_inclusive_mut` for `&mut str`?
2020-02-22 03:54:50 +00:00
bors
183e893aaa Auto merge of #69256 - nnethercote:misc-inlining, r=Centril
Miscellaneous inlining improvements

These commits inline some hot functions that aren't currently inlined, for some speed wins.

r? @Centril
2020-02-20 02:00:31 +00:00
Nicholas Nethercote
ab906179cc Always inline run_utf8_validation.
It only has two call sites, and the one within `from_utf8` is hot within
rustc itself.
2020-02-18 15:42:11 +11:00
Yuki Okushi
5f818f94e7 Rollup merge of #68495 - sdegutis:patch-1, r=Mark-Simulacrum
Updating str.chars docs to mention crates.io.

This might spare someone else a little time searching the stdlib for unicode/grapheme support.
2020-02-17 13:46:48 +09:00
Pyry Kontio
5c9dc57cb5 Don't return empty slice on last iteration with matched terminator. Test reverse iteration. 2020-02-09 23:49:44 +09:00
Pyry Kontio
86bf96291d Implement split_inclusive for slice and str, an splitting iterator that includes the matched part in the iterated substrings as a terminator. 2020-02-09 23:48:52 +09:00
Matthew Jasper
a81c59f9b8 Remove some unsound specializations 2020-02-01 09:11:41 +00:00
Steven Degutis
ac19dffd1e Updating str.chars docs to mention crates.io.
This might spare someone else a little time searching the stdlib for unicode/grapheme support.
2020-01-23 15:25:10 -06:00
Phoebe Bell
0f2ee495e9 Fix formatting: ./x.py fmt 2020-01-16 18:50:53 -08:00
Phoebe Bell
c103c284b9 Move comments for tidy 2020-01-16 18:38:04 -08:00
Phoebe Bell
ca2fae8edb Elaborate on SAFETY comments 2020-01-16 18:32:21 -08:00
Phoebe Bell
e0140ffeb0 Apply suggestions from code review
Co-Authored-By: Ralf Jung <post@ralfj.de>
2020-01-16 18:27:44 -08:00
Phoebe Bell
19fdc6e091 Document unsafe blocks in core::{cell, str, sync} 2020-01-16 18:26:14 -08:00
Mark Rousskov
a06baa56b9 Format the world 2019-12-22 17:42:47 -05:00
Mazdak Farrokhzad
eaeb1138c6 Rollup merge of #67480 - rossmacarthur:fix-41260-avoid-issue-0-part-2, r=Centril
Require issue = "none" over issue = "0" in unstable attributes

These changes make the use of `issue = "none"` required in unstable attributes throughout the compiler.

Notes:
- #66299 is now in beta so `issue = "none"` is accepted.
- The `tidy` tool now fails on `issue = "0"`.
- Tests that used `issue = "0"` were changed to use `issue = "none"`, except for _one_ that asserts `issue = "0"` can still be used.
- The compiler still allows `issue = "0"` because some submodules require it, this could be disallowed once these are updated.

Resolves #41260

r? @varkor
2019-12-22 02:40:04 +01:00
Ross MacArthur
f7256d28d1 Require issue = "none" over issue = "0" in unstable attributes 2019-12-21 13:16:18 +02:00
Broono Lu
16b7fd2272 Fix src/libcore/str/mod.rs doc comments 2019-12-21 18:12:46 +08:00
Mark Rousskov
82184440ec Propagate cfg bootstrap 2019-12-18 12:16:19 -05:00
Mazdak Farrokhzad
1c12dc8cdf Rollup merge of #66735 - SOF3:feature/str_strip, r=KodrAus
Add str::strip_prefix and str::strip_suffix

Introduces a counterpart for `Path::strip_prefix` on `str`.

This was also discussed in https://internals.rust-lang.org/t/pre-pr-path-strip-prefix-counterpart-in-str/11364/.
2019-12-16 05:23:33 +01:00
SOFe
6176051dd0 Set tracking issue for str_strip 2019-12-15 17:07:57 +08:00
Oliver Scherer
5e17e39881 Require stable/unstable annotations for the constness of all stable functions with a const modifier 2019-12-13 11:27:02 +01:00
Sen Jiang
52649ddfbd Fix documentation of pattern for str::matches()
Made it the same as rmatches()
2019-12-03 14:31:41 -08:00
SOFe
4718e20fcf Fixed formatting issues 2019-11-26 19:33:06 +08:00
SOFe
2e2e0dfc1a Improved comments to clarify sasumptions in str::strip_prefix 2019-11-26 17:42:43 +08:00
SOFe
9badc33cda Add str::strip_prefix and str::strip_suffix 2019-11-25 19:36:47 +08:00
Oliver Scherer
02f9167f94 Have tidy ensure that we document all unsafe blocks in libcore 2019-11-06 11:04:42 +01:00
Jeff Dickey
d9ec5fa88c doc(str): show example of chars().count() under len()
the docs are great at explaining that .len() isn't like in other
languages but stops short of explaining how to get the character length.

r? @steveklabnik
2019-11-01 20:18:33 -07:00
Mikko Rantanen
040d88dda1 Remove leading :: from paths in doc examples 2019-10-20 21:13:47 +03:00
Mark Rousskov
d0a6805b0e Allow unused attributes to avoid incremental bug 2019-10-04 11:11:58 -04:00
Mark Rousskov
f359a94849 Snap cfgs to new beta 2019-09-25 08:42:46 -04:00
Oliver Scherer
7767e7fb16 Stabilize str::len, [T]::len, is_empty and str::as_bytes as const fn 2019-09-24 12:56:44 +02:00
Julian Gehring
c4d0c285fe Fix word repetition in str documentation
Fixes a few repetitions of "like like" in the `trim*` methods documentation of `str`.
2019-08-31 17:38:23 +01:00
Dodo
080fdb8184 add missing #[repr(C)] on a union 2019-08-28 17:38:24 +02:00
Ilija Tovilo
3a6a29b4ec Use associated_type_bounds where applicable - closes #61738 2019-08-08 22:39:15 +02:00
Maximilian Roos
3325ff6df4 comments from @lzutao 2019-07-29 12:26:59 -04:00
Maximilian Roos
624c5da1aa impl Debug for Chars 2019-07-29 12:17:59 -04:00
bors
e34d4ae869 Auto merge of #61339 - jridgewell:pointer-alignment, r=BurntSushi
Optimize pointer alignment in utf8 validation

This uses (and reuses) the u8 arrays's inherent block alignment when checking whether the current index is block aligned.

I initially thought that this would just move the expensive `align_offset` call out of the while loop and replace it with a subtraction and bitwise AND. But it appears this optimizes much better, too...

before: https://rust.godbolt.org/z/WIPvWl
after: https://rust.godbolt.org/z/-jBPoW

## Benchmarks

https://github.com/jridgewell/faster-from_utf8/tree/pointer-alignment

```
test from_utf8_2_bytes_fast      ... bench:         310 ns/iter (+/- 42) = 1290 MB/s
test from_utf8_2_bytes_regular   ... bench:         309 ns/iter (+/- 24) = 1294 MB/s

test from_utf8_3_bytes_fast      ... bench:       1,027 ns/iter (+/- 62) = 1168 MB/s
test from_utf8_3_bytes_regular   ... bench:       1,513 ns/iter (+/- 611) = 793 MB/s

test from_utf8_4_bytes_fast      ... bench:       1,788 ns/iter (+/- 26) = 1342 MB/s
test from_utf8_4_bytes_regular   ... bench:       1,907 ns/iter (+/- 181) = 1258 MB/s

test from_utf8_all_bytes_fast    ... bench:       3,463 ns/iter (+/- 97) = 1155 MB/s
test from_utf8_all_bytes_regular ... bench:       4,083 ns/iter (+/- 89) = 979 MB/s

test from_utf8_ascii_fast        ... bench:          88 ns/iter (+/- 4) = 28988 MB/s
test from_utf8_ascii_regular     ... bench:          88 ns/iter (+/- 8) = 28988 MB/s

test from_utf8_cyr_fast          ... bench:       7,707 ns/iter (+/- 531) = 665 MB/s
test from_utf8_cyr_regular       ... bench:       8,202 ns/iter (+/- 135) = 625 MB/s

test from_utf8_enwik8_fast       ... bench:   1,135,756 ns/iter (+/- 84,450) = 8804 MB/s
test from_utf8_enwik8_regular    ... bench:   1,145,468 ns/iter (+/- 79,601) = 8730 MB/s

test from_utf8_jawik10_fast      ... bench:  12,723,844 ns/iter (+/- 473,247) = 785 MB/s
test from_utf8_jawik10_regular   ... bench:  13,384,596 ns/iter (+/- 666,997) = 747 MB/s

test from_utf8_mixed_fast        ... bench:       2,321 ns/iter (+/- 123) = 2081 MB/s
test from_utf8_mixed_regular     ... bench:       2,702 ns/iter (+/- 408) = 1788 MB/s

test from_utf8_mostlyasc_fast    ... bench:         249 ns/iter (+/- 10) = 14666 MB/s
test from_utf8_mostlyasc_regular ... bench:         276 ns/iter (+/- 5) = 13231 MB/s
```
2019-07-17 12:13:36 +00:00
Justin Ridgewell
3b6e8ed502 Do not use pointer alignment on unsupported platforms 2019-07-05 16:39:56 -04:00