Remove alignment-changing in-place collect
This removes the alignment-changing in-place collect optimization introduced in #110353
Currently stable users can't benefit from the optimization because GlobaAlloc doesn't support alignment-changing realloc and neither do most posix allocators. So in practice it has a negative impact on performance.
Explanation from https://github.com/rust-lang/rust/issues/120091#issuecomment-1899071681:
> > You mention that in case of alignment mismatch -- when the new alignment is less than the old -- the implementation calls `mremap`.
>
> I was trying to note that this isn't really the case in practice, due to the semantics of Rust's allocator APIs. The only use of the allocator within the `in_place_collect` implementation itself is [a call to `Allocator::shrink()`](db7125f008/library/alloc/src/vec/in_place_collect.rs (L299-L303)), which per its documentation [allows decreasing the required alignment](https://doc.rust-lang.org/1.75.0/core/alloc/trait.Allocator.html). However, in stable Rust, the only available `Allocator` is [`Global`](https://doc.rust-lang.org/1.75.0/alloc/alloc/struct.Global.html), which delegates to the registered `GlobalAlloc`. Since `GlobalAlloc::realloc()` [cannot change the required alignment](https://doc.rust-lang.org/1.75.0/core/alloc/trait.GlobalAlloc.html#method.realloc), the implementation of [`<Global as Allocator>::shrink()`](db7125f008/library/alloc/src/alloc.rs (L280-L321)) must fall back to creating a brand-new allocation, `memcpy`ing the data into it, and freeing the old allocation, whenever the alignment doesn't remain exactly the same.
>
> Therefore, the underlying allocator, provided by libc or some other source, has no opportunity to internally `mremap()` the data when the alignment is changed, since it has no way of knowing that the allocation is the same.
InPlaceDstBufDrop holds onto the allocation before the shrinking happens
which means it must deallocate the destination elements but the source
allocation.
Currently stable users can't benefit from this because GlobaAlloc doesn't support
alignment-changing realloc and neither do most posix allocators.
So in practice it always results in an extra memcpy.
Split `Vec::dedup_by` into 2 cycles
First cycle runs until we found 2 same elements, second runs after if there any found in the first one. This allows to avoid any memory writes until we found an item which we want to remove.
This leads to significant performance gains if all `Vec` items are kept: -40% on my benchmark with unique integers.
Results of benchmarks before implementation (including new benchmark where nothing needs to be removed):
* vec::bench_dedup_all_100 74.00ns/iter +/- 13.00ns
* vec::bench_dedup_all_1000 572.00ns/iter +/- 272.00ns
* vec::bench_dedup_all_100000 64.42µs/iter +/- 19.47µs
* __vec::bench_dedup_none_100 67.00ns/iter +/- 17.00ns__
* __vec::bench_dedup_none_1000 662.00ns/iter +/- 86.00ns__
* __vec::bench_dedup_none_10000 9.16µs/iter +/- 2.71µs__
* __vec::bench_dedup_none_100000 91.25µs/iter +/- 1.82µs__
* vec::bench_dedup_random_100 105.00ns/iter +/- 11.00ns
* vec::bench_dedup_random_1000 781.00ns/iter +/- 10.00ns
* vec::bench_dedup_random_10000 9.00µs/iter +/- 5.62µs
* vec::bench_dedup_random_100000 449.81µs/iter +/- 74.99µs
* vec::bench_dedup_slice_truncate_100 105.00ns/iter +/- 16.00ns
* vec::bench_dedup_slice_truncate_1000 2.65µs/iter +/- 481.00ns
* vec::bench_dedup_slice_truncate_10000 18.33µs/iter +/- 5.23µs
* vec::bench_dedup_slice_truncate_100000 501.12µs/iter +/- 46.97µs
Results after implementation:
* vec::bench_dedup_all_100 75.00ns/iter +/- 9.00ns
* vec::bench_dedup_all_1000 494.00ns/iter +/- 117.00ns
* vec::bench_dedup_all_100000 58.13µs/iter +/- 8.78µs
* __vec::bench_dedup_none_100 52.00ns/iter +/- 22.00ns__
* __vec::bench_dedup_none_1000 417.00ns/iter +/- 116.00ns__
* __vec::bench_dedup_none_10000 4.11µs/iter +/- 546.00ns__
* __vec::bench_dedup_none_100000 40.47µs/iter +/- 5.36µs__
* vec::bench_dedup_random_100 77.00ns/iter +/- 15.00ns
* vec::bench_dedup_random_1000 681.00ns/iter +/- 86.00ns
* vec::bench_dedup_random_10000 11.66µs/iter +/- 2.22µs
* vec::bench_dedup_random_100000 469.35µs/iter +/- 20.53µs
* vec::bench_dedup_slice_truncate_100 100.00ns/iter +/- 5.00ns
* vec::bench_dedup_slice_truncate_1000 2.55µs/iter +/- 224.00ns
* vec::bench_dedup_slice_truncate_10000 18.95µs/iter +/- 2.59µs
* vec::bench_dedup_slice_truncate_100000 492.85µs/iter +/- 72.84µs
Resolves#77772
P.S. Note that this is same PR as #92104 I just missed review then forgot about it.
Also, I cannot reopen that pull request so I am creating a new one.
I responded to remaining questions directly by adding commentaries to my code.
Expand in-place iteration specialization to Flatten, FlatMap and ArrayChunks
This enables the following cases to collect in-place:
```rust
let v = vec![[0u8; 4]; 1024]
let v: Vec<_> = v.into_iter().flatten().collect();
let v: Vec<Option<NonZeroUsize>> = vec![NonZeroUsize::new(0); 1024];
let v: Vec<_> = v.into_iter().flatten().collect();
let v = vec![u8; 4096];
let v: Vec<_> = v.into_iter().array_chunks::<4>().collect();
```
Especially the nicheful-option-flattening should be useful in real code.
While a better approach would be to implement it for all ZSTs
which are `Copy` and have trivial `Clone`,
the last property cannot be detected for now.
Signed-off-by: Petr Portnov <me@progrm-jarvis.ru>
Implement `From<{&,&mut} [T; N]>` for `Vec<T>` where `T: Clone`
Currently, if `T` implements `Clone`, we can create a `Vec<T>` from an `&[T]` or an `&mut [T]`, can we also support creating a `Vec<T>` from an `&[T; N]` or an `&mut [T; N]`? Also, do I need to add `#[inline]` to the implementation?
ACP: rust-lang/libs-team#220. [Accepted]
Closes#100880.
Make useless_ptr_null_checks smarter about some std functions
This teaches the `useless_ptr_null_checks` lint that some std functions can't ever return null pointers, because they need to point to valid data, get references as input, etc.
This is achieved by introducing an `#[rustc_never_returns_null_ptr]` attribute and adding it to these std functions (gated behind bootstrap `cfg_attr`).
Later on, the attribute could maybe be used to tell LLVM that the returned pointer is never null. I don't expect much impact of that though, as the functions are pretty shallow and usually the input data is already never null.
Follow-up of PR #113657Fixes#114442
Add note that Vec::as_mut_ptr() does not materialize a reference to the internal buffer
See discussion on https://github.com/thomcc/rust-typed-arena/issues/62 and [t-opsem](https://rust-lang.zulipchat.com/#narrow/stream/136281-t-opsem/topic/is.20this.20typed_arena.20code.20sound.20under.20stacked.2Ftree.20borrows.3F)
This method already does the correct thing here, but it is worth guaranteeing that it does so it can be used more freely in unsafe code without having to worry about potential Stacked/Tree Borrows violations. This moves one more unsafe usage pattern from the "very likely sound but technically not fully defined" box into "definitely sound", and currently our surface area of the latter is woefully small.
I'm not sure how best to word this, opening this PR as a way to start discussion.
Problem
Language in the Vec->Indexing documentation sounds stilted due to
incorrect word ordering: "... type allows to access values by index."
Solution
Reorder words in the Vec->Indexing documentation to flow better:
"... type allows access to values by index." The phrase "allows access to"
also matches other existing documentation.