Previously it only did integer-ABI things, but this way it does data pointers too. That gives more information in general to the backend, and allows slightly simplifying one of the helpers in slice iterators.
Stabilize target_feature_11
# Stabilization report
This is an updated version of https://github.com/rust-lang/rust/pull/116114, which is itself a redo of https://github.com/rust-lang/rust/pull/99767. Most of this commit and report were copied from those PRs. Thanks ```@LeSeulArtichaut``` and ```@calebzulawski!```
## Summary
Allows for safe functions to be marked with `#[target_feature]` attributes.
Functions marked with `#[target_feature]` are generally considered as unsafe functions: they are unsafe to call, cannot *generally* be assigned to safe function pointers, and don't implement the `Fn*` traits.
However, calling them from other `#[target_feature]` functions with a superset of features is safe.
```rust
// Demonstration function
#[target_feature(enable = "avx2")]
fn avx2() {}
fn foo() {
// Calling `avx2` here is unsafe, as we must ensure
// that AVX is available first.
unsafe {
avx2();
}
}
#[target_feature(enable = "avx2")]
fn bar() {
// Calling `avx2` here is safe.
avx2();
}
```
Moreover, once https://github.com/rust-lang/rust/pull/135504 is merged, they can be converted to safe function pointers in a context in which calling them is safe:
```rust
// Demonstration function
#[target_feature(enable = "avx2")]
fn avx2() {}
fn foo() -> fn() {
// Converting `avx2` to fn() is a compilation error here.
avx2
}
#[target_feature(enable = "avx2")]
fn bar() -> fn() {
// `avx2` coerces to fn() here
avx2
}
```
See the section "Closures" below for justification of this behaviour.
## Test cases
Tests for this feature can be found in [`tests/ui/target_feature/`](f6cb952dc1/tests/ui/target-feature).
## Edge cases
### Closures
* [target-feature 1.1: should closures inherit target-feature annotations? #73631](https://github.com/rust-lang/rust/issues/73631)
Closures defined inside functions marked with #[target_feature] inherit the target features of their parent function. They can still be assigned to safe function pointers and implement the appropriate `Fn*` traits.
```rust
#[target_feature(enable = "avx2")]
fn qux() {
let my_closure = || avx2(); // this call to `avx2` is safe
let f: fn() = my_closure;
}
```
This means that in order to call a function with #[target_feature], you must guarantee that the target-feature is available while the function, any closures defined inside it, as well as any safe function pointers obtained from target-feature functions inside it, execute.
This is usually ensured because target features are assumed to never disappear, and:
- on any unsafe call to a `#[target_feature]` function, presence of the target feature is guaranteed by the programmer through the safety requirements of the unsafe call.
- on any safe call, this is guaranteed recursively by the caller.
If you work in an environment where target features can be disabled, it is your responsibility to ensure that no code inside a target feature function (including inside a closure) runs after this (until the feature is enabled again).
**Note:** this has an effect on existing code, as nowadays closures do not inherit features from the enclosing function, and thus this strengthens a safety requirement. It was originally proposed in #73631 to solve this by adding a new type of UB: “taking a target feature away from your process after having run code that uses that target feature is UB” .
This was motivated by userspace code already assuming in a few places that CPU features never disappear from a program during execution (see i.e. 2e29bdf908/crates/std_detect/src/detect/arch/x86.rs); however, concerns were raised in the context of the Linux kernel; thus, we propose to relax that requirement to "causing the set of usable features to be reduced is unsafe; when doing so, the programmer is required to ensure that no closures or safe fn pointers that use removed features are still in scope".
* [Fix #[inline(always)] on closures with target feature 1.1 #111836](https://github.com/rust-lang/rust/pull/111836)
Closures accept `#[inline(always)]`, even within functions marked with `#[target_feature]`. Since these attributes conflict, `#[inline(always)]` wins out to maintain compatibility.
### ABI concerns
* [The extern "C" ABI of SIMD vector types depends on target features #116558](https://github.com/rust-lang/rust/issues/116558)
The ABI of some types can change when compiling a function with different target features. This could have introduced unsoundness with target_feature_11, but recent fixes (#133102, #132173) either make those situations invalid or make the ABI no longer dependent on features. Thus, those issues should no longer occur.
### Special functions
The `#[target_feature]` attribute is forbidden from a variety of special functions, such as main, current and future lang items (e.g. `#[start]`, `#[panic_handler]`), safe default trait implementations and safe trait methods.
This was not disallowed at the time of the first stabilization PR for target_features_11, and resulted in the following issues/PRs:
* [`#[target_feature]` is allowed on `main` #108645](https://github.com/rust-lang/rust/issues/108645)
* [`#[target_feature]` is allowed on default implementations #108646](https://github.com/rust-lang/rust/issues/108646)
* [#[target_feature] is allowed on #[panic_handler] with target_feature 1.1 #109411](https://github.com/rust-lang/rust/issues/109411)
* [Prevent using `#[target_feature]` on lang item functions #115910](https://github.com/rust-lang/rust/pull/115910)
## Documentation
* Reference: [Document the `target_feature_11` feature reference#1181](https://github.com/rust-lang/reference/pull/1181)
---
cc tracking issue https://github.com/rust-lang/rust/issues/69098
cc ```@workingjubilee```
cc ```@RalfJung```
r? ```@rust-lang/lang```
coverage: Defer part of counter-creation until codegen
Follow-up to #135481 and #135873.
One of the pleasant properties of the new counter-assignment algorithm is that we can stop partway through the process, store the intermediate state in MIR, and then resume the rest of the algorithm during codegen. This lets it take into account which parts of the control-flow graph were eliminated by MIR opts, resulting in fewer physical counters and simpler counter expressions.
Those improvements end up completely obsoleting much larger chunks of code that were previously responsible for cleaning up the coverage metadata after MIR opts, while also doing a more thorough cleanup job.
(That change also unlocks some further simplifications that I've kept out of this PR to limit its scope.)
Don't reset cast kind without also updating the operand in `simplify_cast` in GVN
Consider this heavily elided segment of the pre-GVN example code that was committed as a test:
```rust
let _4: *const ();
let _5: *const [()];
let mut _6: *const ();
let _7: *mut ();
let mut _8: *const [()];
let mut _9: std::boxed::Box<()>;
let mut _10: *const ();
/* ... */
// Deref a box
_10 = copy ((_9.0: std::ptr::Unique<()>).0: std::ptr::NonNull<()>) as *const () (Transmute);
_4 = copy _10;
_6 = copy _4;
// Inlined body of `slice::from_raw_parts`, to turn a unit pointer into a slice-of-unit pointer
_5 = *const [()] from (copy _6, copy _11);
_8 = copy _5;
// Cast the raw slice-of-unit pointer back to a unit pointer
_7 = copy _8 as *mut () (PtrToPtr);
```
A malformed optimization was changing `_7` (which casted the slice-of-unit ptr to a unit ptr) to:
```
_7 = copy _5 as *mut () (Transmute);
```
...where `_8` was just replaced with `_5` bc of simple copy propagation, that part is not important... the CastKind changing to Transmute is the important part here.
In #133324, two new functionalities were implemented:
* Peeking through unsized -> sized PtrToPtr casts whose operand is `AggregateKind::RawPtr`, to turn it into PtrToPtr casts of the base of the aggregate. In this case, this allows us to see that the value of `_7` is just a ptr-to-ptr cast of `_6`.
* Folding a PtrToPtr cast of an operand which is a Transmute cast into just a single Transmute, which (theoretically) allows us to treat `_7` as a transmute into `*mut ()` of the base of the cast of `_10`, which is the place projection of `((_9.0: std::ptr::Unique<()>).0: std::ptr::NonNull<()>)`.
However, when applying those two subsequent optimizations, we must *not* update the CastKind of the final cast *unless* we also update the operand of the cast, since the operand may no longer make sense with the updated CastKind.
In this case, this is problematic because the type of `_8` is `*const [()]`, but that operand in assignment statement of `_7` does *not* get turned into something like `((_9.0: std::ptr::Unique<()>).0: std::ptr::NonNull<()>)` -- **in other words, `try_to_operand` fails** -- because GVN only turns value nodes into locals or consts, not projections of locals. So we fail to update the operand, but we still update the CastKind to Transmute, which means we now are transmuting types of different sizes (a wide pointer and a thin pointer).
r? `@scottmcm` or `@cjgillot`
Fixes#136361Fixes#135997
Pretty print pattern type values with transmute if they don't satisfy their pattern
Instead of printing `0_u32 is 1..`, we now print the default fallback rendering that we also use for invalid bools, chars, ...: `{transmute(0x00000000): (u32) is 1..=}`.
These cases can occur in mir dumps when const prop propagates a constant across a safety check that would prevent the actually UB value from existing. That's fine though, as it's dead code and we always need to allow UB in dead code.
follow-up to https://github.com/rust-lang/rust/pull/136176
cc ``@compiler-errors`` ``@scottmcm``
r? ``@RalfJung`` because of the interpreter changes
Implement unstable `new_range` feature
Switches `a..b`, `a..`, and `a..=b` to resolve to the new range types.
For rust-lang/rfcs#3550
Tracking issue #123741
also adds the re-export that was missed in the original implementation of `new_range_api`
Similar to how the alignment is already checked, this adds a check
for null pointer dereferences in debug mode. It is implemented similarly
to the alignment check as a MirPass.
This is related to a 2025H1 project goal for better UB checks in debug
mode: https://github.com/rust-lang/rust-project-goals/pull/177.
Render pattern types nicely in mir dumps
avoid falling through to the fallback rendering that just does a hex dump
r? ``@scottmcm``
best reviewed commit by commit
Add `#[optimize(none)]`
cc #54882
This extends the `optimize` attribute to add `none`, which corresponds to the LLVM `OptimizeNone` attribute.
Not sure if an MCP is required for this, happy to file one if so.
Don't drop types with no drop glue when building drops for tailcalls
this is required as otherwise drops of `&mut` refs count as a usage of a
'two-phase temporary' causing an ICE.
fixes#128097
The underlying issue is that the current code generates drops for `&mut` which are later counted as a second use of a two-phase temporary:
`bat t.rs -p`
```rust
#![expect(incomplete_features)]
#![feature(explicit_tail_calls)]
fn f(x: &mut ()) {
let _y = String::new();
become f(x);
}
fn main() {}
```
`rustc t.rs -Zdump_mir=f`
```text
error: internal compiler error: compiler/rustc_borrowck/src/borrow_set.rs:298:17: found two uses for 2-phase borrow temporary _4: bb2[1] and bb3[0]
--> t.rs:6:5
|
6 | become f(x);
| ^^^^^^^^^^^
thread 'rustc' panicked at compiler/rustc_borrowck/src/borrow_set.rs:298:17:
Box<dyn Any>
stack backtrace:
[REDACTED]
error: aborting due to 1 previous error
```
`bat ./mir_dump/t.f.-------.renumber.0.mir -p -lrust`
```rust
// MIR for `f` 0 renumber
fn f(_1: &mut ()) -> () {
debug x => _1;
let mut _0: ();
let mut _2: !;
let _3: std::string::String;
let mut _4: &mut ();
scope 1 {
debug _y => _3;
}
bb0: {
StorageLive(_3);
_3 = String::new() -> [return: bb1, unwind: bb4];
}
bb1: {
FakeRead(ForLet(None), _3);
StorageLive(_4);
_4 = &mut (*_1);
drop(_3) -> [return: bb2, unwind: bb3];
}
bb2: {
StorageDead(_3);
tailcall f(Spanned { node: move _4, span: t.rs:6:14: 6:15 (#0) });
}
bb3 (cleanup): {
drop(_4) -> [return: bb4, unwind terminate(cleanup)];
}
bb4 (cleanup): {
resume;
}
}
```
Note how `_4 is moved into the tail call in `bb2` and dropped in `bb3`.
This PR adds a check that the locals we drop need dropping.
r? `@oli-obk` (feel free to reassign, I'm not sure who would be a good reviewer, but thought you might have an idea)
cc `@beepster4096,` since you wrote the original drop implementation.
Reexport likely/unlikely in std::hint
Since `likely`/`unlikely` should be working now, we could reexport them in `std::hint`. I'm not sure if this is already approved or if it requires approval
Tracking issue: #26179
coverage: Completely overhaul counter assignment, using node-flow graphs
The existing code for choosing where to put physical counter-increments gets the job done, but is very ad-hoc and hard to modify without introducing tricky regressions.
This PR replaces all of that with a more principled approach, based on the algorithm described in "Optimal measurement points for program frequency counts" (Knuth & Stevenson, 1973).
---
We start by ensuring that our graph has “balanced flow”, i.e. each node's flow (execution count) is equal to the sum of all its in-edge flows, and equal to the sum of all its out-edge flows. That isn't naturally true of control-flow graphs, so we introduce a wrapper type `BalancedFlowGraph` to fix that by introducing synthetic nodes and edges as needed.
Once our graph has balanced flow, the next step is to create another view of that graph in which each node's successors have all been merged into one “supernode”. Consequently, each node's out-edges can be coalesced into a single out-edge to one of those supernodes. Because of the balanced-flow property, the flow of that coalesced edge is equal to the flow of the original node.
Having expressed all of our node flows as edge flows, we can then analyze node flows using techniques for analyzing edge flows. We incrementally build a spanning tree over the merged supernodes, such that each new edge in the spanning tree represents a node whose flow can be computed from that of other nodes.
When this is done, we end up with a list of “counter terms” for each node, describing which nodes need physical counters, and how the remaining nodes can have their flow calculated by adding and subtracting those physical counters.
---
The re-blessed coverage tests show that this results in modest or major improvements for our test programs. Some tests need fewer physical counters, some tests need fewer expression nodes for the same number of physical counters, and some tests show striking reductions in both.
Fix cycle error only occurring with -Zdump-mir
fixes#134205
During mir dumping, we evaluate static items to render their allocations. If a static item refers to itself, its own MIR will have a reference to itself, so during mir dumping we end up evaluating the static again, causing us to try to build MIR again (mir dumping happens during MIR building).
Thus I disabled evaluation of statics during MIR dumps in case the MIR body isn't far enough along yet to be able to be guaranteed cycle free.
Make MIR cleanup for functions with impossible predicates into a real MIR pass
It's a bit jarring to see the body of a function with an impossible-to-satisfy where clause suddenly go to a single `unreachable` terminator when looking at the MIR dump output in order, and I discovered it's because we manually replace the body outside of a MIR pass.
Let's make it into a fully flegded MIR pass so it's more clear what it's doing and when it's being applied.
Add an InstSimplify for repetitive array expressions
I noticed in https://github.com/rust-lang/rust/pull/135068#issuecomment-2569955426 that GVN's implementation of this same transform was quite profitable on the deep-vector benchmark. But of course GVN doesn't run in unoptimized builds, so this is my attempt to write a version of this transform that benefits the deep-vector case and is fast enough to run in InstSimplify.
The benchmark suite indicates that this is effective.
Adds `#[rustc_force_inline]` which is similar to always inlining but
reports an error if the inlining was not possible, and which always
attempts to inline annotated items, regardless of optimisation levels.
It can only be applied to free functions to guarantee that the MIR
inliner will be able to resolve calls.