enable align_to tests in Miri
With https://github.com/rust-lang/miri/issues/1074 resolved, we can enable these tests in Miri.
I also tweaked the test sized to get reasonable execution times with decent test coverage.
clarify documentation of remove_dir errors
remove_dir will error if the path doesn't exist or isn't a directory.
It's useful to clarify that this is "remove dir or fail" not "remove dir
if it exists".
I don't think this belongs in the title. "Removes an existing, empty
directory" is strangely worded-- there's no such thing as a non-existing
directory. Better to just say explicitly it will return an error.
Use min_specialization in libcore
Getting `TrustedRandomAccess` to work is the main interesting thing here.
- `get_unchecked` is now an unstable, hidden method on `Iterator`
- The contract for `TrustedRandomAccess` is made clearer in documentation
- Fixed a bug where `Debug` would create aliasing references when using the specialized zip impl
- Added tests for the side effects of `next_back` and `nth`.
closes#68536
Move to intra-doc links for task.rs and vec.rs
Partial fix for #75080
links for [`get`], [`get_mut`] skipped due to #75643
link for [`copy_from_slice`] skipped due to #63351
`stride == 1` case can be computed more efficiently through `-p (mod
a)`. That, then translates to a nice and short sequence of LLVM
instructions:
%address = ptrtoint i8* %p to i64
%negptr = sub i64 0, %address
%offset = and i64 %negptr, %a_minus_one
And produces pretty much ideal code-gen when this function is used in
isolation.
Typical use of this function will, however, involve use of
the result to offset a pointer, i.e.
%aligned = getelementptr inbounds i8, i8* %p, i64 %offset
This still looks very good, but LLVM does not really translate that to
what would be considered ideal machine code (on any target). For example
that's the codegen we obtain for an unknown alignment:
; x86_64
dec rsi
mov rax, rdi
neg rax
and rax, rsi
add rax, rdi
In particular negating a pointer is not something that’s going to be
optimised for in the design of CISC architectures like x86_64. They
are much better at offsetting pointers. And so we’d love to utilize this
ability and produce code that's more like this:
; x86_64
lea rax, [rsi + rdi - 1]
neg rsi
and rax, rsi
To achieve this we need to give LLVM an opportunity to apply its
various peep-hole optimisations that it does during DAG selection. In
particular, the `and` instruction appears to be a major inhibitor here.
We cannot, sadly, get rid of this load-bearing operation, but we can
reorder operations such that LLVM has more to work with around this
instruction.
One such ordering is proposed in #75579 and results in LLVM IR that
looks broadly like this:
; using add enables `lea` and similar CISCisms
%offset_ptr = add i64 %address, %a_minus_one
%mask = sub i64 0, %a
%masked = and i64 %offset_ptr, %mask
; can be folded with `gepi` that may follow
%offset = sub i64 %masked, %address
…and generates the intended x86_64 machine code. One might also wonder
how the increased amount of code would impact a RISC target. Turns out
not much:
; aarch64 previous ; aarch64 new
sub x8, x1, #1 add x8, x1, x0
neg x9, x0 sub x8, x8, #1
and x8, x9, x8 neg x9, x1
add x0, x0, x8 and x0, x8, x9
(and similarly for ppc, sparc, mips, riscv, etc)
The only target that seems to do worse is… wasm32.
Onto actual measurements – the best way to evaluate snippets like these
is to use llvm-mca. Much like Aarch64 assembly would allow to suspect,
there isn’t any performance difference to be found. Both snippets
execute in same number of cycles for the CPUs I tried. On x86_64,
we get throughput improvement of >50%, however!
Remove `#[cfg(miri)]` from OnceCell tests
They were carried over from once_cell crate, but they are not entirely
correct (as miri now supports more things), and we don't run miri
tests for std, so let's just remove them.
Maybe one day we'll run miri in std, but then we can just re-install
these attributes.
Move to intra doc links for std::io
Helps with #75080.
@rustbot modify labels: T-doc, A-intra-doc-links, T-rustdoc
r? @jyn514
I had no problems with those files so I added some small links here and there.
Improve codegen for `align_offset`
In this PR the `align_offset` implementation is changed/improved to produce better code in certain scenarios such as when pointer type is has a stride of 1 or when building for low optimisation levels.
While these changes do not achieve the "ideal" codegen referenced in #75579, it gets significantly closer to it. I’m not actually sure if the codegen can actually be much better with this function returning the offset, rather than the aligned pointer.
See the descriptions for separate commits for further information.