Commit Graph

195 Commits

Author SHA1 Message Date
bors
b435960c4c Auto merge of #95644 - WaffleLapkin:str_split_as_str_refactor_take2, r=Amanieu
`Split*::as_str` refactor

I've made this patch almost a year ago, so the rename and the behavior change are in one commit, sorry 😅

This fixes #84974, as it's required to make other changes work.

This PR
- Renames `as_str` method of string `Split*` iterators to `remainder` (it seems like the `as_str` name was confusing to users)
- Makes `remainder` return `Option<&str>`, to distinguish between "the iterator is exhausted" and "the tail is empty", this was [required on the tracking issue](https://github.com/rust-lang/rust/issues/77998#issuecomment-832696619)

r? `@m-ou-se`
2023-01-03 11:06:08 +00:00
jonathanCogan
db47071df2 Replace libstd, libcore, liballoc in line comments. 2022-12-30 14:00:42 +01:00
Stefano Zacchiroli
7f9438910c str.lines() docstring: clarify that line endings are not returned
Previously, the str.lines() docstring stated that lines are split at line
endings, but not whether those were returned or not.  This new version of the
docstring states this explicitly, avoiding the need of getting to doctests to
get an answer to this FAQ.
2022-12-17 12:20:56 +01:00
Maybe Waffle
b458a49f26 Replace Split*::as_str with remainder
This commit
- Renames `Split*::{as_str -> remainder}` as it seems less confusing
- Makes `remainder` return Option<&str> to distinguish between
  "iterator is exhausted" and "the tail is empty"
2022-12-16 13:04:22 +00:00
Maybe Waffle
ca4989eac2 SplitInternal: always set finished in get_end 2022-12-16 12:57:22 +00:00
Eduardo Sánchez Muñoz
00e7b54d46 Make some trivial functions #[inline(always)] 2022-12-07 17:11:17 +01:00
Rageking8
58110572fb fix dupe word typos 2022-12-05 16:42:36 +08:00
Matthias Krüger
e4d1fe7b15 Rollup merge of #104436 - ismailmaj:add-slice-to-stack-allocated-string-comment, r=Mark-Simulacrum
Add slice to the stack allocated string comment

Precise that the "stack allocated string" is not a string but a string slice.

``@rustbot`` label +A-docs
2022-11-29 22:43:16 +01:00
Vojtech Kral
07ccf67f59 Document split{_ascii,}_whitespace() for empty strings 2022-11-24 15:22:24 +01:00
The 8472
3ed8fccff5 fix OOB access in SIMD impl of str.contains() 2022-11-22 20:59:19 +01:00
ismailmaj
005c6dfde6 type annotate &str when stack allocating a string 2022-11-21 10:38:04 +01:00
The 8472
a2b2010891 - convert from core::arch to core::simd
- bump simd compare to 32bytes
- import small slice compare code from memmem crate
- try a few different probe bytes to avoid degenerate cases
  - but special-case 2-byte needles
2022-11-15 18:30:31 +01:00
The 8472
3d4a8482b9 x86_64 SSE2 fast-path for str.contains(&str) and short needles
Based on Wojciech Muła's "SIMD-friendly algorithms for substring searching"[0]

The two-way algorithm is Big-O efficient but it needs to preprocess the needle
to find a "criticla factorization" of it. This additional work is significant
for short needles. Additionally it mostly advances needle.len() bytes at a time.

The SIMD-based approach used here on the other hand can advance based on its
vector width, which can exceed the needle length. Except for pathological cases,
but due to being limited to small needles the worst case blowup is also small.

benchmarks taken on a Zen2:

```
16CGU, OLD:
test str::bench_contains_short_short                     ... bench:          27 ns/iter (+/- 1)
test str::bench_contains_short_long                      ... bench:         667 ns/iter (+/- 29)
test str::bench_contains_bad_naive                       ... bench:         131 ns/iter (+/- 2)
test str::bench_contains_bad_simd                        ... bench:         130 ns/iter (+/- 2)
test str::bench_contains_equal                           ... bench:         148 ns/iter (+/- 4)


16CGU, NEW:
test str::bench_contains_short_short                     ... bench:           8 ns/iter (+/- 0)
test str::bench_contains_short_long                      ... bench:         135 ns/iter (+/- 4)
test str::bench_contains_bad_naive                       ... bench:         130 ns/iter (+/- 2)
test str::bench_contains_bad_simd                        ... bench:         292 ns/iter (+/- 1)
test str::bench_contains_equal                           ... bench:           3 ns/iter (+/- 0)


1CGU, OLD:
test str::bench_contains_short_short                     ... bench:          30 ns/iter (+/- 0)
test str::bench_contains_short_long                      ... bench:         713 ns/iter (+/- 17)
test str::bench_contains_bad_naive                       ... bench:         131 ns/iter (+/- 3)
test str::bench_contains_bad_simd                        ... bench:         130 ns/iter (+/- 3)
test str::bench_contains_equal                           ... bench:         148 ns/iter (+/- 6)

1CGU, NEW:
test str::bench_contains_short_short                     ... bench:          10 ns/iter (+/- 0)
test str::bench_contains_short_long                      ... bench:         111 ns/iter (+/- 0)
test str::bench_contains_bad_naive                       ... bench:         135 ns/iter (+/- 3)
test str::bench_contains_bad_simd                        ... bench:         274 ns/iter (+/- 2)
test str::bench_contains_equal                           ... bench:           4 ns/iter (+/- 0)
```


[0] http://0x80.pl/articles/simd-strfind.html#sse-avx2
2022-11-14 23:03:16 +01:00
Sky
9a7e527e28 Fix typo in ReverseSearcher docs 2022-10-17 13:14:15 -04:00
Eduardo Sánchez Muñoz
9c7c232a50 Improve FromStr example
The `from_str` implementation from the example had an `unwrap` that would make it panic on invalid input strings. Instead of panicking, it nows returns an error to better reflect the intented behavior of the `FromStr` trait.
2022-10-02 11:32:56 +02:00
Pietro Albini
3975d55d98 remove cfg(bootstrap) 2022-09-26 10:14:45 +02:00
Guillaume Gomez
b4fdc5861d Add missing documentation for bool::from_str 2022-09-21 14:17:11 +02:00
Deadbeef
075084f772 Make const_eval_select a real intrinsic 2022-09-04 20:35:23 +08:00
Jane Losare-Lusby
bf7611d55e Move error trait into core 2022-08-22 13:28:25 -07:00
Matthias Krüger
a45f69f27d Rollup merge of #100822 - WaffleLapkin:no_offset_question_mark, r=scottmcm
Replace most uses of `pointer::offset` with `add` and `sub`

As PR title says, it replaces `pointer::offset` in compiler and standard library with `pointer::add` and `pointer::sub`. This generally makes code cleaner, easier to grasp and removes (or, well, hides) integer casts.

This is generally trivially correct, `.offset(-constant)` is just `.sub(constant)`, `.offset(usized as isize)` is just `.add(usized)`, etc. However in some cases we need to be careful with signs of things.

r? ````@scottmcm````

_split off from #100746_
2022-08-21 16:54:07 +02:00
Maybe Waffle
e4720e1cf2 Replace most uses of pointer::offset with add and sub 2022-08-21 02:21:41 +04:00
Matthias Krüger
d49906519b Rollup merge of #99544 - dylni:expose-utf8lossy, r=Mark-Simulacrum
Expose `Utf8Lossy` as `Utf8Chunks`

This PR changes the feature for `Utf8Lossy` from `str_internals` to `utf8_lossy` and improves the API. This is done to eventually expose the API as stable.

Proposal: rust-lang/libs-team#54
Tracking Issue: #99543
2022-08-20 19:32:07 +02:00
dylni
e8ee0b7b2b Expose Utf8Lossy as Utf8Chunks 2022-08-20 12:49:20 -04:00
Xavier Noria
64d1c91a31 ascii -> ASCII in code comment 2022-08-05 18:45:42 +02:00
David Herberth
c1c1abc08a Use split_once in FromStr docs 2022-07-18 08:57:43 +02:00
bors
5fb8a39266 Auto merge of #97367 - WaffleLapkin:stabilize_checked_slice_to_str_conv, r=dtolnay
Stabilize checked slice->str conversion functions

This PR stabilizes the following APIs as `const` functions in Rust 1.63:
```rust
// core::str

pub const fn from_utf8(v: &[u8]) -> Result<&str, Utf8Error>;

impl Utf8Error {
    pub const fn valid_up_to(&self) -> usize;
    pub const fn error_len(&self) -> Option<usize>;
}
```

Note that the `from_utf8_mut` function is not stabilized as unique references (`&mut _`) are [unstable in const context].

FCP: https://github.com/rust-lang/rust/issues/91006#issuecomment-1134593095

[unstable in const context]: https://github.com/rust-lang/rust/issues/57349
2022-06-19 05:51:42 +00:00
Imbolc
acda8866cc Document an edge case of str::split_once 2022-06-13 13:35:49 +03:00
Maybe Waffle
89295352ee Allow some internal instability 2022-05-26 13:00:14 +04:00
Maybe Waffle
138aa99503 Stabilize checked slice->str conversion functions 2022-05-24 22:58:28 +04:00
Guillaume Gomez
1210b50dbb Rollup merge of #97190 - SylvainDe:master, r=Dylan-DPC
Add implicit call to from_str via parse in documentation

The documentation mentions "FromStr’s from_str method is often used implicitly,
through str’s parse method. See parse’s documentation for examples.".

It may be nicer to show that in the code example as well.
2022-05-21 11:39:48 +02:00
Dan Gohman
b836cf6fb8 Say "last" instead of "rightmost" in the documentation for std::str::rfind.
In the documentation comment for `std::str::rfind`, say "last" instead
of "rightmost" to describe the match that `rfind` finds. This follows the
spirit of #30459, for which `trim_left` and `trim_right` were replaced by
`trim_start` and `trim_end` to be more clear about how they work on
text which is displayed right-to-left.
2022-05-19 15:31:17 -07:00
SylvainDe
4c1daba940 Add implicit call to from_str via parse in documentation
The documentation mentions "FromStr’s from_str method is often used implicitly,
through str’s parse method. See parse’s documentation for examples.".

It may be nicer to show that in the code example as well.
2022-05-19 22:01:43 +02:00
Matthias Krüger
6c8001b85c Rollup merge of #96008 - fmease:warn-on-useless-doc-hidden-on-assoc-impl-items, r=lcnr
Warn on unused `#[doc(hidden)]` attributes on trait impl items

[Zulip conversation](https://rust-lang.zulipchat.com/#narrow/stream/266220-rustdoc/topic/.E2.9C.94.20Validy.20checks.20for.20.60.23.5Bdoc.28hidden.29.5D.60).

Whether an associated item in a trait impl is shown or hidden in the documentation entirely depends on the corresponding item in the trait declaration. Rustdoc completely ignores `#[doc(hidden)]` attributes on impl items. No error or warning is emitted:

```rust
pub trait Tr { fn f(); }
pub struct Ty;
impl Tr for Ty { #[doc(hidden)] fn f() {} }
//               ^^^^^^^^^^^^^^ ignored by rustdoc and currently
//                              no error or warning issued
```

This may lead users to the wrong belief that the attribute has an effect. In fact, several such cases are found in the standard library (I've removed all of them in this PR).
There does not seem to exist any incentive to allow this in the future either: Impl'ing a trait for a type means the type *fully* conforms to its API. Users can add `#[doc(hidden)]` to the whole impl if they want to hide the implementation or add the attribute to the corresponding associated item in the trait declaration to hide the specific item. Hiding an implementation of an associated item does not make much sense: The associated item can still be found on the trait page.

This PR emits the warn-by-default lint `unused_attribute` for this case with a future-incompat warning.

`@rustbot` label T-compiler T-rustdoc A-lint
2022-05-09 18:45:36 +02:00
bors
8a2fe75d0e Auto merge of #95960 - jhpratt:remove-rustc_deprecated, r=compiler-errors
Remove `#[rustc_deprecated]`

This removes `#[rustc_deprecated]` and introduces diagnostics to help users to the right direction (that being `#[deprecated]`). All uses of `#[rustc_deprecated]` have been converted. CI is expected to fail initially; this requires #95958, which includes converting `stdarch`.

I plan on following up in a short while (maybe a bootstrap cycle?) removing the diagnostics, as they're only intended to be short-term.
2022-05-09 04:47:30 +00:00
León Orell Valerian Liehr
9d157ada35 Warn on unused doc(hidden) on trait impl items 2022-05-08 22:53:14 +02:00
Eduardo Sánchez Muñoz
93ae6f80e3 Make some usize-typed masks definition agnostic to the size of usize
Some masks where defined as
```rust
const NONASCII_MASK: usize = 0x80808080_80808080u64 as usize;
```
where it was assumed that `usize` is never wider than 64, which is currently true.

To make those constants valid in a hypothetical 128-bit target, these constants have been redefined in an `usize`-width-agnostic way
```rust
const NONASCII_MASK: usize = usize::from_ne_bytes([0x80; size_of::<usize>()]);
```

There are already some cases where Rust anticipates the possibility of supporting 128-bit targets, such as not implementing `From<usize>` for `u64`.
2022-04-15 17:04:59 +02:00
Jacob Pratt
4fbe73e0b7 Remove use of #[rustc_deprecated] 2022-04-14 01:33:13 -04:00
Christopher Durham
b92cd1a32c Clarify str::from_utf8_unchecked's invariants
Specifically, make it clear that it is immediately UB to pass ill-formed UTF-8 into the function. The previous wording left space to interpret that the UB only occurred when calling another function, which "assumes that `&str`s are valid UTF-8."

This does not change whether str being UTF-8 is a safety or a validity invariant. (As per previous discussion, it is a safety invariant, not a validity invariant.) It just makes it clear that valid UTF-8 is a precondition of str::from_utf8_unchecked, and that emitting an Abstract Machine fault (e.g. UB or a sanitizer error) on invalid UTF-8 is a valid thing to do.

If user code wants to create an unsafe `&str` pointing to ill-formed UTF-8, it must be done via transmutes. Also, just, don't.
2022-04-10 15:04:57 -05:00
bors
ed6c958ee4 Auto merge of #95760 - Dylan-DPC:rollup-uskzggh, r=Dylan-DPC
Rollup of 4 pull requests

Successful merges:

 - #95189 (Stop flagging unexpected inner attributes as outer ones in certain diagnostics)
 - #95752 (Regression test for #82866)
 - #95753 (Correct safety reasoning in `str::make_ascii_{lower,upper}case()`)
 - #95757 (Use gender neutral terms)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
2022-04-07 09:50:11 +00:00
Dylan DPC
6639604bd6 Rollup merge of #95753 - ChayimFriedman2:patch-1, r=dtolnay
Correct safety reasoning in `str::make_ascii_{lower,upper}case()`

I don't understand why the previous comment was used (it was inserted in #66564), but it doesn't explain why these functions are safe, only why `str::as_bytes{_mut}()` are safe.

If someone thinks they make perfect sense, I'm fine with closing this PR.
2022-04-07 11:17:16 +02:00
bors
f565016edd Auto merge of #95678 - pietroalbini:pa-1.62.0-bootstrap, r=Mark-Simulacrum
Bump bootstrap compiler to 1.61.0 beta

This PR bumps the bootstrap compiler to the 1.61.0 beta. The first commit changes the stage0 compiler, the second commit applies the "mechanical" changes and the third and fourth commits apply changes explained in the relevant comments.

r? `@Mark-Simulacrum`
2022-04-07 07:34:04 +00:00
Chayim Refael Friedman
b399e7ea7c Correct safety reasoning in str::make_ascii_{lower,upper}case() 2022-04-07 07:52:07 +03:00
Deadbeef
9a2d0e53f1 Update documentation for trim* and is_whitespace to include newlines 2022-04-06 11:03:36 +10:00
Pietro Albini
181d28bb61 trivial cfg(bootstrap) changes 2022-04-05 23:18:40 +02:00
lcnr
afbecc0f68 remove now unnecessary lang items 2022-03-30 11:23:58 +02:00
Max Baumann
64ad96dd9a add diagnostic items for clippy's 2022-03-24 18:18:44 +01:00
T-O-R-U-S
72a25d05bf Use implicit capture syntax in format_args
This updates the standard library's documentation to use the new syntax. The
documentation is worthwhile to update as it should be more idiomatic
(particularly for features like this, which are nice for users to get acquainted
with). The general codebase is likely more hassle than benefit to update: it'll
hurt git blame, and generally updates can be done by folks updating the code if
(and when) that makes things more readable with the new format.

A few places in the compiler and library code are updated (mostly just due to
already having been done when this commit was first authored).
2022-03-10 10:23:40 -05:00
Deadbeef
4654a91001 Constify slice index for strings 2022-03-06 17:28:50 +11:00
ltdk
edd318c313 Add {floor,ceil}_char_boundary methods to str 2022-02-07 13:34:08 -05:00
Thom Chiovoloni
41f821461f Fix comment grammar for do_count_chars 2022-02-05 11:17:10 -08:00