Add `#[must_use]` attribute to `Coroutine` trait
[Coroutines tracking issue](https://github.com/rust-lang/rust/issues/43122)
Like closures (`FnOnce`, `AsyncFn`, etc.), coroutines are lazy and do nothing unless called (resumed). Closure traits like `FnOnce` have `#[must_use = "closures are lazy and do nothing unless called"]` to catch likely bugs for users of APIs that produce them. This PR adds such a `#[must_use]` attribute to `trait Coroutine`.
Optimize integer `pow` by removing the exit branch
The branch at the end of the `pow` implementations is redundant with multiplication code already present in the loop. By rotating the exit check, this branch can be largely removed, improving code size and reducing instruction cache misses.
Testing on my machine (`x86_64`, 11th Gen Intel Core i5-1135G7 @ 2.40GHz), the `num::int_pow` benchmarks improve by some 40% for the unchecked operations and show some slight improvement for the checked operations as well.
miri: make vtable addresses not globally unique
Miri currently gives vtables a unique global address. That's not actually matching reality though. So this PR enables Miri to generate different addresses for the same type-trait pair.
To avoid generating an unbounded number of `AllocId` (and consuming unbounded amounts of memory), we use the "salt" technique that we also already use for giving constants non-unique addresses: the cache is keyed on a "salt" value n top of the actually relevant key, and Miri picks a random salt (currently in the range `0..16`) each time it needs to choose an `AllocId` for one of these globals -- that means we'll get up to 16 different addresses for each vtable. The salt scheme is integrated into the global allocation deduplication logic in `tcx`, and also used for functions and string literals. (So this also fixes the problem that casting the same function to a fn ptr over and over will consume unbounded memory.)
r? `@saethlin`
Fixes https://github.com/rust-lang/miri/issues/3737
nontemporal_store: make sure that the intrinsic is truly just a hint
The `!nontemporal` flag for stores in LLVM *sounds* like it is just a hint, but actually, it is not -- at least on x86, non-temporal stores need very special treatment by the programmer or else the Rust memory model breaks down. LLVM still treats these stores as-if they were normal stores for optimizations, which is [highly dubious](https://github.com/llvm/llvm-project/issues/64521). Let's avoid all that dubiousness by making our own non-temporal stores be truly just a hint, which is possible on some targets (e.g. ARM). On all other targets, non-temporal stores become regular stores.
~~Blocked on https://github.com/rust-lang/stdarch/pull/1541 propagating to the rustc repo, to make sure the `_mm_stream` intrinsics are unaffected by this change.~~
Fixes https://github.com/rust-lang/rust/issues/114582
Cc `@Amanieu` `@workingjubilee`
fix: #128855 Ensure `Guard`'s `drop` method is removed at `opt-level=s` for `…
fix: #128855
…Copy` types
Added `#[inline]` to the `drop` method in the `Guard` implementation to ensure that the method is removed by the compiler at optimization level `opt-level=s` for `Copy` types. This change aims to align the method's behavior with optimization expectations and ensure it does not affect performance.
r? `@scottmcm`
Apply "polymorphization at home" to RawVec
The idea here is to move all the logic in RawVec into functions with explicit size and alignment parameters. This should eliminate all the fussing about how tweaking RawVec code produces large swings in compile times.
This uncovered https://github.com/rust-lang/rust-clippy/issues/12979, so I've modified the relevant test in a way that tries to preserve the spirit of the test without tripping the ICE.
core: optimise Debug impl for ascii::Char
Rather than writing character at a time, optimise Debug implementation
for core::ascii::Char such that it writes the entire representation
with a single write_str call.
With that, add tests for Display and Debug.
Issue: https://github.com/rust-lang/rust/issues/110998
Improve `Ord` violation help
Recent experience in #128083 showed that the panic message when an Ord violation is detected by the new sort implementations can be confusing. So this PR aims to improve it, together with minor bug fixes in the doc comments for sort*, sort_unstable* and select_nth_unstable*.
Is it possible to get these changes into the 1.81 release? It doesn't change behavior and would greatly help when users encounter this panic for the first time, which they may after upgrading to 1.81.
Tagging `@orlp`
Rather than writing character at a time, optimise Debug implementation
for core::ascii::Char such that it writes the entire representation as
with a single write_str call.
With that, add tests for Display and Debug implementations.
Added `#[inline]` to the `drop` method in the `Guard` implementation to ensure that the method is removed by the compiler at optimization level `opt-level=s` for `Copy` types. This change aims to align the method's behavior with optimization expectations and ensure it does not affect performance.
Mark `{f32,f64}::{next_up,next_down,midpoint}` inline
Most float functions are marked `#[inline]` so any float symbols used by these functions only need to be provided if the function itself is used. RFL recently noticed that `next_up`, `next_down`, and `midpoint` for `f32` and `f64` are not inline, which causes linker errors when building with certain configurations <https://lore.kernel.org/all/20240806150619.192882-1-ojeda@kernel.org/>.
Add the missing attributes so the symbols should no longer be required.
Most float functions are marked `#[inline]` so any float symbols used by
these functions only need to be provided if the function itself is used.
RFL recently noticed that `next_up`, `next_down`, and `midpoint` for
`f32` and `f64` are not inline, which causes linker errors when building
with certain configurations [1].
Add the missing attributes so the symbols should no longer be required.
Cc: Gary Guo <gary@garyguo.net>
Reported-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/all/20240806150619.192882-1-ojeda@kernel.org/ [1]
Add `f16` and `f128` math functions
This adds intrinsics and math functions for `f16` and `f128` floating point types. Support is quite limited and some things are broken so tests don't run on many platforms, but this provides a starting point.
Instead of saying to "consider adding a `#[repr(C)]` or
`#[repr(transparent)]` attribute to this struct", we now tell users to
"Use `*const ffi::c_char` instead, and pass the value from
`CStr::as_ptr()`" when the type involved is a `CStr` or a `CString`.
Co-authored-by: Jieyou Xu <jieyouxu@outlook.com>
Correct the const stabilization of `<[T]>::last_chunk`
`<[T]>::first_chunk` became const stable in 1.77, but `<[T]>::last_chunk` was left out. This was fixed in 3488679768, which reached stable in 1.80, making `<[T]>::last_chunk` const stable as of that version, but it is documented as being const stable as 1.77. While this is what should have happened, the documentation should reflect what actually did happen.
Remove unnecessary constants from flt2dec dragon
The "dragon" `flt2dec` algorithm uses multi-precision multiplication by (sometimes large) powers of 10. It has precomputed some values to help with these calculations.
BUT:
* There is no need to store powers of 10 and 2 * powers of 10: it is trivial to compute the second from the first.
* We can save a chunk of memory by storing powers of 5 instead of powers of 10 for the large powers (and just shifting as appropriate).
* This also slightly speeds up the routines (by ~1-3%) since the intermediate products are smaller and the shift is cheap.
In this PR, we remove the unnecessary constants and do the necessary adjustments.
Relevant benchmarks before (on my Threadripper 3970X, x86_64-unknown-linux-gnu):
```
num::flt2dec::bench_big_shortest 137.92/iter +/- 2.24
num::flt2dec::strategy:🐉:bench_big_exact_12 2135.28/iter +/- 38.90
num::flt2dec::strategy:🐉:bench_big_exact_3 904.95/iter +/- 10.58
num::flt2dec::strategy:🐉:bench_big_exact_inf 47230.33/iter +/- 320.84
num::flt2dec::strategy:🐉:bench_big_shortest 3915.05/iter +/- 51.37
```
and after:
```
num::flt2dec::bench_big_shortest 137.40/iter +/- 2.03
num::flt2dec::strategy:🐉:bench_big_exact_12 2101.10/iter +/- 25.63
num::flt2dec::strategy:🐉:bench_big_exact_3 873.86/iter +/- 4.20
num::flt2dec::strategy:🐉:bench_big_exact_inf 47468.19/iter +/- 374.45
num::flt2dec::strategy:🐉:bench_big_shortest 3877.01/iter +/- 45.74
```
`<[T]>::first_chunk` became const stable in 1.77, but `<[T]>::last_chunk` was
left out. This was fixed in 3488679768, which reached stable in 1.80,
making `<[T]>::last_chunk` const stable as of that version, but it is
documented as being const stable as 1.77. While this is what should have
happened, the documentation should reflect what actually did happen.
Implement `UncheckedIterator` directly for `RepeatN`
This just pulls the code out of `next` into `next_unchecked`, rather than making the `Some` and `unwrap_unchecked`ing it.
And while I was touching it, I added a codegen test that `array::repeat` for something that's just `Clone`, not `Copy`, still ends up optimizing to the same thing as `[x; n]`: <https://rust.godbolt.org/z/YY3a5ajMW>.
The "dragon" `flt2dec` algorithm uses multi-precision multiplication by
(sometimes large) powers of 10. It has precomputed some values to help
with these calculations.
BUT:
* There is no need to store powers of 10 and 2 * powers of 10: it is
trivial to compute the second from the first.
* We can save a chunk of memory by storing powers of 5 instead of powers
of 10 for the large powers (and just shifting by 2 as appropriate).
* This also slightly speeds up the routines (by ~1-3%) since the
intermediate products are smaller and the shift is cheap.
In this PR, we remove the unnecessary constants and do the necessary
adjustments.
Relevant benchmarks before (on my Threadripper 3970X, x86_64-unknown-linux-gnu):
```
num::flt2dec::bench_big_shortest 137.92/iter +/- 2.24
num::flt2dec::strategy:🐉:bench_big_exact_12 2135.28/iter +/- 38.90
num::flt2dec::strategy:🐉:bench_big_exact_3 904.95/iter +/- 10.58
num::flt2dec::strategy:🐉:bench_big_exact_inf 47230.33/iter +/- 320.84
num::flt2dec::strategy:🐉:bench_big_shortest 3915.05/iter +/- 51.37
```
and after:
```
num::flt2dec::bench_big_shortest 137.40/iter +/- 2.03
num::flt2dec::strategy:🐉:bench_big_exact_12 2101.10/iter +/- 25.63
num::flt2dec::strategy:🐉:bench_big_exact_3 873.86/iter +/- 4.20
num::flt2dec::strategy:🐉:bench_big_exact_inf 47468.19/iter +/- 374.45
num::flt2dec::strategy:🐉:bench_big_shortest 3877.01/iter +/- 45.74
```