Optimized vec::IntoIter::next_chunk impl
```
x86_64v1, default
test vec::bench_next_chunk ... bench: 696 ns/iter (+/- 22)
x86_64v1, pr
test vec::bench_next_chunk ... bench: 309 ns/iter (+/- 4)
znver2, default
test vec::bench_next_chunk ... bench: 17,272 ns/iter (+/- 117)
znver2, pr
test vec::bench_next_chunk ... bench: 211 ns/iter (+/- 3)
```
On znver2 the default impl seems to be slow due to different inlining decisions. It goes through `core::array::iter_next_chunk`
which has a deep call tree.
```
test vec::bench_next_chunk ... bench: 696 ns/iter (+/- 22)
x86_64v1, pr
test vec::bench_next_chunk ... bench: 309 ns/iter (+/- 4)
znver2, default
test vec::bench_next_chunk ... bench: 17,272 ns/iter (+/- 117)
znver2, pr
test vec::bench_next_chunk ... bench: 211 ns/iter (+/- 3)
```
The znver2 default impl seems to be slow due to inlining decisions. It goes through `core::array::iter_next_chunk`
which has a deeper call tree.
* Implement IsZero trait for tuples up to 8 IsZero elements;
* Implement IsZero for u8/i8, leading to implementation of it for arrays of them too;
* Add more codegen tests for this optimization.
* Lower size of array for IsZero trait because it fails to inline checks
In particular, be clear that it is sound to specify memory not
originating from a previous `Vec` allocation. That is already suggested
in other parts of the documentation about zero-alloc conversions to Box<[T]>.
Incorporate a constraint from `slice::from_raw_parts` that was missing
but needs to be fulfilled, since a `Vec` can be converted into a slice.
Optimize `Vec::insert` for the case where `index == len`.
By skipping the call to `copy` with a zero length. This makes it closer
to `push`.
I did this recently for `SmallVec`
(https://github.com/servo/rust-smallvec/pull/282) and it was a big perf win in
one case. Although I don't have a specific use case in mind, it seems
worth doing it for `Vec` as well.
Things to note:
- In the `index < len` case, the number of conditions checked is
unchanged.
- In the `index == len` case, the number of conditions checked increases
by one, but the more expensive zero-length copy is avoided.
- In the `index > len` case the code now reserves space for the extra
element before panicking. This seems like an unimportant change.
r? `@cuviper`
By skipping the call to `copy` with a zero length. This makes it closer
to `push`.
I did this recently for `SmallVec`
(https://github.com/servo/rust-smallvec/pull/282) and it was a big perf win in
one case. Although I don't have a specific use case in mind, it seems
worth doing it for `Vec` as well.
Things to note:
- In the `index < len` case, the number of conditions checked is
unchanged.
- In the `index == len` case, the number of conditions checked increases
by one, but the more expensive zero-length copy is avoided.
- In the `index > len` case the code now reserves space for the extra
element before panicking. This seems like an unimportant change.
Rust 1.62.0 introduced a couple new `unused_imports` warnings
in `no_global_oom_handling` builds, making a total of 5 warnings:
```txt
warning: unused import: `Unsize`
--> library/alloc/src/boxed/thin.rs:6:33
|
6 | use core::marker::{PhantomData, Unsize};
| ^^^^^^
|
= note: `#[warn(unused_imports)]` on by default
warning: unused import: `from_fn`
--> library/alloc/src/string.rs:51:18
|
51 | use core::iter::{from_fn, FusedIterator};
| ^^^^^^^
warning: unused import: `core::ops::Deref`
--> library/alloc/src/vec/into_iter.rs:12:5
|
12 | use core::ops::Deref;
| ^^^^^^^^^^^^^^^^
warning: associated function `shrink` is never used
--> library/alloc/src/raw_vec.rs:424:8
|
424 | fn shrink(&mut self, cap: usize) -> Result<(), TryReserveError> {
| ^^^^^^
|
= note: `#[warn(dead_code)]` on by default
warning: associated function `forget_remaining_elements` is never used
--> library/alloc/src/vec/into_iter.rs:126:19
|
126 | pub(crate) fn forget_remaining_elements(&mut self) {
| ^^^^^^^^^^^^^^^^^^^^^^^^^
```
This patch cleans them so that projects compiling `alloc` without
infallible allocations do not see the warnings. It also enables
the use of `-Dwarnings`.
The couple `dead_code` ones may be reverted when some fallible
allocation support starts using them.
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Documentation for the following methods
with_capacity
with_capacity_in
with_capacity_and_hasher
reserve
reserve_exact
try_reserve
try_reserve_exact
was inconsistent and often not entirely correct where they existed on the following types
Vec
VecDeque
String
OsString
PathBuf
BinaryHeap
HashSet
HashMap
BufWriter
LineWriter
since the allocator is allowed to allocate more than the requested capacity in all such cases, and will frequently "allocate" much more in the case of zero-sized types (I also checked BufReader, but there the docs appear to be accurate as it appears to actually allocate the exact capacity).
Some effort was made to make the documentation more consistent between types as well.
Fix with_capacity* methods for Vec
Fix *reserve* methods for Vec
Fix docs for *reserve* methods of VecDeque
Fix docs for String::with_capacity
Fix docs for *reserve* methods of String
Fix docs for OsString::with_capacity
Fix docs for *reserve* methods on OsString
Fix docs for with_capacity* methods on HashSet
Fix docs for *reserve methods of HashSet
Fix docs for with_capacity* methods of HashMap
Fix docs for *reserve methods on HashMap
Fix expect messages about OOM in doctests
Fix docs for BinaryHeap::with_capacity
Fix docs for *reserve* methods of BinaryHeap
Fix typos
Fix docs for with_capacity on BufWriter and LineWriter
Fix consistent use of `hasher` between `HashMap` and `HashSet`
Fix warning in doc test
Add test for capacity of vec with ZST
Fix doc test error
Add #[rustc_box] and use it inside alloc
This commit adds an alternative content boxing syntax, and uses it inside alloc.
```Rust
#![feature(box_syntax)]
fn foo() {
let foo = box bar;
}
```
is equivalent to
```Rust
#![feature(rustc_attrs)]
fn foo() {
let foo = #[rustc_box] Box::new(bar);
}
```
The usage inside the very performance relevant code in
liballoc is the only remaining relevant usage of box syntax
in the compiler (outside of tests, which are comparatively easy to port).
box syntax was originally designed to be used by all Rust
developers. This introduces a replacement syntax more tailored
to only being used inside the Rust compiler, and with it,
lays the groundwork for eventually removing box syntax.
[Earlier work](https://github.com/rust-lang/rust/pull/87781#issuecomment-894714878) by `@nbdd0121` to lower `Box::new` to `box` during THIR -> MIR building ran into borrow checker problems, requiring the lowering to be adjusted in a way that led to [performance regressions](https://github.com/rust-lang/rust/pull/87781#issuecomment-894872367). The proposed change in this PR lowers `#[rustc_box] Box::new` -> `box` in the AST -> HIR lowering step, which is way earlier in the compiler, and thus should cause less issues both performance wise as well as regarding type inference/borrow checking/etc. Hopefully, future work can move the lowering further back in the compiler, as long as there are no performance regressions.
alloc: remove repeated word in comment
Linux's `checkpatch.pl` reports:
```txt
#42544: FILE: rust/alloc/vec/mod.rs:2692:
WARNING: Possible repeated word: 'to'
+ // - Elements are :Copy so it's OK to to copy them, without doing
```
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Clarify the guarantees of Vec::as_ptr and Vec::as_mut_ptr when there's no allocation
Currently the documentation says they return a pointer to the vector's buffer, which has the implied precondition that the vector allocated some memory. However `Vec`'s documentation also specifies that it won't always allocate, so it's unclear whether the pointer returned is valid in that case. Of course you won't be able to read/write actual bytes to/from it since the capacity is 0, but there's an exception: zero sized read/writes. They are still valid as long as the pointer is not null and the memory it points to wasn't deallocated, but `Vec::as_ptr` and `Vec::as_mut_ptr` don't specify that's not the case. This PR thus specifies they are actually valid for zero sized reads since `Vec` is implemented to hold a dangling pointer in those cases, which is neither null nor was deallocated.
Linux's `checkpatch.pl` reports:
```txt
#42544: FILE: rust/alloc/vec/mod.rs:2692:
WARNING: Possible repeated word: 'to'
+ // - Elements are :Copy so it's OK to to copy them, without doing
```
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Remove impossible panic note from `Vec::append`
Neither the number of elements in a vector can overflow a `usize`, nor
can the amount of elements in two vectors.
Clarify slice and Vec iteration order
While already being inferable from the doc examples, it wasn't fully specified. This is the only logical way to do a slice iterator, so I think this should be uncontroversial. It also improves the `Vec::into_iter` example to better show the order and that the iterator returns owned values.
Like we have `add`/`sub` which are the `usize` version of `offset`, this adds the `usize` equivalent of `offset_from`. Like how `.add(d)` replaced a whole bunch of `.offset(d as isize)`, you can see from the changes here that it's fairly common that code actually knows the order between the pointers and *wants* a `usize`, not an `isize`.
As a bonus, this can do `sub nuw`+`udiv exact`, rather than `sub`+`sdiv exact`, which can be optimized slightly better because it doesn't have to worry about negatives. That's why the slice iterators weren't using `offset_from`, though I haven't updated that code in this PR because slices are so perf-critical that I'll do it as its own change.
This is an intrinsic, like `offset_from`, so that it can eventually be allowed in CTFE. It also allows checking the extra safety condition -- see the test confirming that CTFE catches it if you pass the pointers in the wrong order.
Tweak the vec-calloc runtime check to only apply to shortish-arrays
r? `@Mark-Simulacrum`
`@nbdd0121` pointed out in https://github.com/rust-lang/rust/pull/95362#issuecomment-1114085395 that LLVM currently doesn't constant-fold the `IsZero` check for long arrays, so that seems like a reasonable justification for limiting it.
It appears that it's based on length, not byte size, (https://godbolt.org/z/4s48Y81dP), so that's what I used in the PR. Maybe it's a ["the number of inlining shall be three"](https://youtu.be/s4wnuiCwTGU?t=320) sort of situation.
Certainly there's more that could be done here -- that generated code that checks long arrays byte-by-byte is highly suboptimal, for example -- but this is an easy, low-risk tweak.
Clarify docs for `from_raw_parts` on `Vec` and `String`
Closes#95427
Original safety explanation for `from_raw_parts` was unclear on safety for consuming a C string. This clarifies when doing so is safe.
Currently it just calls `truncate(0)`. `truncate()` is (a) not marked as
`#[inline]`, and (b) more general than needed for `clear()`.
This commit changes `clear()` to do the work itself. This modest change
was first proposed in rust-lang#74172, where the reviewer rejected it because
there was insufficient evidence that `Vec::clear()`'s performance
mattered enough to justify the change. Recent changes within rustc have
made `Vec::clear()` hot within `macro_parser.rs`, so the change is now
clearly worthwhile.
Although it doesn't show wins on CI perf runs, this seems to be because they
use PGO. But not all platforms currently use PGO. Also, local builds don't use
PGO, and `truncate` sometimes shows up in an over-represented fashion in local
profiles. So local profiling will be made easier by this change.
Note that this will also benefit `String::clear()`, because it just
calls `Vec::clear()`.
Finally, the commit removes the `vec-clear.rs` codegen test. It was
added in #52908. From before then until now, `Vec::clear()` just called
`Vec::truncate()` with a zero length. The body of Vec::truncate() has
changed a lot since then. Now that `Vec::clear()` is doing actual work
itself, and not just calling `Vec::truncate()`, it's not surprising that
its generated code includes a load and an icmp. I think it's reasonable
to remove this test.
Fix double drop of allocator in IntoIter impl of Vec
Fixes#95269
The `drop` impl of `IntoIter` reconstructs a `RawVec` from `buf`, `cap` and `alloc`, when that `RawVec` is dropped it also drops the allocator. To avoid dropping the allocator twice we wrap it in `ManuallyDrop` in the `InttoIter` struct.
Note this is my first contribution to the standard library, so I might be missing some details or a better way to solve this.
Allow comparing `Vec`s with different allocators using `==`
See https://stackoverflow.com/q/71021633/7884305.
I did not changed the `PartialOrd` impl too because it was not generic already (didn't support `Vec<T> <=> Vec<U> where T: PartialOrd<U>`).
Does it needs tests?
I don't think this will hurt type inference much because the default allocator is usually not inferred (`new()` specifies it directly, and even with other allocators, you pass the allocator to `new_in()` so the compiler usually knows the type).
I think this requires FCP since the impls are already stable.