`[T; N]::zip` is "eager" but most zips are mapped.
This causes poor optimization in generated code.
This is a fundamental design issue and "zip" is
"prime real estate" in terms of function names,
so let's free it up again.
Rollup of 6 pull requests
Successful merges:
- #111936 (Include test suite metadata in the build metrics)
- #111952 (Remove DesugaringKind::Replace.)
- #111966 (Add #[inline] to array TryFrom impls)
- #111983 (Perform MIR type ops locally in new solver)
- #111997 (Fix re-export of doc hidden macro not showing up)
- #112014 (rustdoc: get unnormalized link destination for suggestions)
r? `@ghost`
`@rustbot` modify labels: rollup
Improve the `array::map` codegen
The `map` method on arrays [is documented as sometimes performing poorly](https://doc.rust-lang.org/std/primitive.array.html#note-on-performance-and-stack-usage), and after [a question on URLO](https://users.rust-lang.org/t/try-trait-residual-o-trait-and-try-collect-into-array/88510?u=scottmcm) prompted me to take another look at the core [`try_collect_into_array`](7c46fb2111/library/core/src/array/mod.rs (L865-L912)) function, I had some ideas that ended up working better than I'd expected.
There's three main ideas in here, split over three commits:
1. Don't use `array::IntoIter` when we can avoid it, since that seems to not get SRoA'd, meaning that every step writes things like loop counters into the stack unnecessarily
2. Don't return arrays in `Result`s unnecessarily, as that doesn't seem to optimize away even with `unwrap_unchecked` (perhaps because it needs to get moved into a new LLVM type to account for the discriminant)
3. Don't distract LLVM with all the `Option` dances when we know for sure we have enough items (like in `map` and `zip`). This one's a larger commit as to do it I ended up adding a new `pub(crate)` trait, but hopefully those changes are still straight-forward.
(No libs-api changes; everything should be completely implementation-detail-internal.)
It's still not completely fixed -- I think it needs pcwalton's `memcpy` optimizations still (#103830) to get further -- but this seems to go much better than before. And the remaining `memcpy`s are just `transmute`-equivalent (`[T; N] -> ManuallyDrop<[T; N]>` and `[MaybeUninit<T>; N] -> [T; N]`), so hopefully those will be easier to remove with LLVM16 than the previous subobject copies 🤞
r? `@thomcc`
As a simple example, this test
```rust
pub fn long_integer_map(x: [u32; 64]) -> [u32; 64] {
x.map(|x| 13 * x + 7)
}
```
On nightly <https://rust.godbolt.org/z/xK7548TGj> takes `sub rsp, 808`
```llvm
start:
%array.i.i.i.i = alloca [64 x i32], align 4
%_3.sroa.5.i.i.i = alloca [65 x i32], align 4
%_5.i = alloca %"core::iter::adapters::map::Map<core::array::iter::IntoIter<u32, 64>, [closure@/app/example.rs:2:11: 2:14]>", align 8
```
(and yes, that's a 6**5**-element array `alloca` despite 6**4**-element input and output)
But with this PR it's only `sub rsp, 520`
```llvm
start:
%array.i.i.i.i.i.i = alloca [64 x i32], align 4
%array1.i.i.i = alloca %"core::mem::manually_drop::ManuallyDrop<[u32; 64]>", align 4
```
Similarly, the loop it emits on nightly is scalar-only and horrifying
```nasm
.LBB0_1:
mov esi, 64
mov edi, 0
cmp rdx, 64
je .LBB0_3
lea rsi, [rdx + 1]
mov qword ptr [rsp + 784], rsi
mov r8d, dword ptr [rsp + 4*rdx + 528]
mov edi, 1
lea edx, [r8 + 2*r8]
lea r8d, [r8 + 4*rdx]
add r8d, 7
.LBB0_3:
test edi, edi
je .LBB0_11
mov dword ptr [rsp + 4*rcx + 272], r8d
cmp rsi, 64
jne .LBB0_6
xor r8d, r8d
mov edx, 64
test r8d, r8d
jne .LBB0_8
jmp .LBB0_11
.LBB0_6:
lea rdx, [rsi + 1]
mov qword ptr [rsp + 784], rdx
mov edi, dword ptr [rsp + 4*rsi + 528]
mov r8d, 1
lea esi, [rdi + 2*rdi]
lea edi, [rdi + 4*rsi]
add edi, 7
test r8d, r8d
je .LBB0_11
.LBB0_8:
mov dword ptr [rsp + 4*rcx + 276], edi
add rcx, 2
cmp rcx, 64
jne .LBB0_1
```
whereas with this PR it's unrolled and vectorized
```nasm
vpmulld ymm1, ymm0, ymmword ptr [rsp + 64]
vpaddd ymm1, ymm1, ymm2
vmovdqu ymmword ptr [rsp + 328], ymm1
vpmulld ymm1, ymm0, ymmword ptr [rsp + 96]
vpaddd ymm1, ymm1, ymm2
vmovdqu ymmword ptr [rsp + 360], ymm1
```
(though sadly still stack-to-stack)
The type is unsafe and now exposed to the whole crate.
Document it properly and add an unsafe method so the
caller can make it visible that something unsafe is happening.
Clarify `array::from_fn` documentation
I've seen quite a few of people on social media confused of where the length of array is coming from in the newly stabilized `array::from_fn` example.
This PR tries to clarify the documentation on this.
On my first Rust project, I spent more time than I care to admit
figuring out how to efficiently get an array from a slice. Update the
array documentation to explain this a bit more clearly.
(As a side note, it's a bit unfortunate that get-array-from-slice is
only available via trait since that means it can't be used from const
functions yet.)
Fixes 85115
This only updates unstable functions.
`array::try_map` didn't actually exist before, despite the tracking issue 79711 still being open from the old PR 79713.
Because after PR 86041, the optimizer no longer load-merges at the LLVM IR level, which might be part of the perf loss. (I'll run perf and see if this makes a difference.)
Also I added a codegen test so this hopefully won't regress in future -- it passes on stable and with my change here, but not on the 2021-11-09 nightly.