Stabilize target_feature_11
# Stabilization report
This is an updated version of https://github.com/rust-lang/rust/pull/116114, which is itself a redo of https://github.com/rust-lang/rust/pull/99767. Most of this commit and report were copied from those PRs. Thanks ```@LeSeulArtichaut``` and ```@calebzulawski!```
## Summary
Allows for safe functions to be marked with `#[target_feature]` attributes.
Functions marked with `#[target_feature]` are generally considered as unsafe functions: they are unsafe to call, cannot *generally* be assigned to safe function pointers, and don't implement the `Fn*` traits.
However, calling them from other `#[target_feature]` functions with a superset of features is safe.
```rust
// Demonstration function
#[target_feature(enable = "avx2")]
fn avx2() {}
fn foo() {
// Calling `avx2` here is unsafe, as we must ensure
// that AVX is available first.
unsafe {
avx2();
}
}
#[target_feature(enable = "avx2")]
fn bar() {
// Calling `avx2` here is safe.
avx2();
}
```
Moreover, once https://github.com/rust-lang/rust/pull/135504 is merged, they can be converted to safe function pointers in a context in which calling them is safe:
```rust
// Demonstration function
#[target_feature(enable = "avx2")]
fn avx2() {}
fn foo() -> fn() {
// Converting `avx2` to fn() is a compilation error here.
avx2
}
#[target_feature(enable = "avx2")]
fn bar() -> fn() {
// `avx2` coerces to fn() here
avx2
}
```
See the section "Closures" below for justification of this behaviour.
## Test cases
Tests for this feature can be found in [`tests/ui/target_feature/`](f6cb952dc1/tests/ui/target-feature).
## Edge cases
### Closures
* [target-feature 1.1: should closures inherit target-feature annotations? #73631](https://github.com/rust-lang/rust/issues/73631)
Closures defined inside functions marked with #[target_feature] inherit the target features of their parent function. They can still be assigned to safe function pointers and implement the appropriate `Fn*` traits.
```rust
#[target_feature(enable = "avx2")]
fn qux() {
let my_closure = || avx2(); // this call to `avx2` is safe
let f: fn() = my_closure;
}
```
This means that in order to call a function with #[target_feature], you must guarantee that the target-feature is available while the function, any closures defined inside it, as well as any safe function pointers obtained from target-feature functions inside it, execute.
This is usually ensured because target features are assumed to never disappear, and:
- on any unsafe call to a `#[target_feature]` function, presence of the target feature is guaranteed by the programmer through the safety requirements of the unsafe call.
- on any safe call, this is guaranteed recursively by the caller.
If you work in an environment where target features can be disabled, it is your responsibility to ensure that no code inside a target feature function (including inside a closure) runs after this (until the feature is enabled again).
**Note:** this has an effect on existing code, as nowadays closures do not inherit features from the enclosing function, and thus this strengthens a safety requirement. It was originally proposed in #73631 to solve this by adding a new type of UB: “taking a target feature away from your process after having run code that uses that target feature is UB” .
This was motivated by userspace code already assuming in a few places that CPU features never disappear from a program during execution (see i.e. 2e29bdf908/crates/std_detect/src/detect/arch/x86.rs); however, concerns were raised in the context of the Linux kernel; thus, we propose to relax that requirement to "causing the set of usable features to be reduced is unsafe; when doing so, the programmer is required to ensure that no closures or safe fn pointers that use removed features are still in scope".
* [Fix #[inline(always)] on closures with target feature 1.1 #111836](https://github.com/rust-lang/rust/pull/111836)
Closures accept `#[inline(always)]`, even within functions marked with `#[target_feature]`. Since these attributes conflict, `#[inline(always)]` wins out to maintain compatibility.
### ABI concerns
* [The extern "C" ABI of SIMD vector types depends on target features #116558](https://github.com/rust-lang/rust/issues/116558)
The ABI of some types can change when compiling a function with different target features. This could have introduced unsoundness with target_feature_11, but recent fixes (#133102, #132173) either make those situations invalid or make the ABI no longer dependent on features. Thus, those issues should no longer occur.
### Special functions
The `#[target_feature]` attribute is forbidden from a variety of special functions, such as main, current and future lang items (e.g. `#[start]`, `#[panic_handler]`), safe default trait implementations and safe trait methods.
This was not disallowed at the time of the first stabilization PR for target_features_11, and resulted in the following issues/PRs:
* [`#[target_feature]` is allowed on `main` #108645](https://github.com/rust-lang/rust/issues/108645)
* [`#[target_feature]` is allowed on default implementations #108646](https://github.com/rust-lang/rust/issues/108646)
* [#[target_feature] is allowed on #[panic_handler] with target_feature 1.1 #109411](https://github.com/rust-lang/rust/issues/109411)
* [Prevent using `#[target_feature]` on lang item functions #115910](https://github.com/rust-lang/rust/pull/115910)
## Documentation
* Reference: [Document the `target_feature_11` feature reference#1181](https://github.com/rust-lang/reference/pull/1181)
---
cc tracking issue https://github.com/rust-lang/rust/issues/69098
cc ```@workingjubilee```
cc ```@RalfJung```
r? ```@rust-lang/lang```
tests: `-Copt-level=3` instead of `-O` in codegen tests
An effective blocker for redefining the meaning of `-O` is to stop reusing this somewhat ambiguous alias in our own codegen test suite. The choice between `-Copt-level=2` and `-Copt-level=3` is arbitrary for most of our tests. In most cases it makes no difference, so I set most of them to `-Copt-level=3`, as it will lead to slightly more "normalized" codegen.
try-job: test-various
try-job: arm-android
try-job: armhf-gnu
try-job: i686-gnu-1
try-job: i686-gnu-2
try-job: i686-mingw
try-job: i686-msvc-1
try-job: i686-msvc-2
try-job: aarch64-apple
try-job: aarch64-gnu
Previously, we unconditionally set the bitwidth to 128-bits, the largest
an discrimnator would possibly be. Then, LLVM would cut down the constant by
chopping off leading zeroes before emitting the DWARF. LLVM only
supported 64-bit descriminators, so this would also have occasionally
resulted in truncated data (or an assert) if more than 64-bits were
used.
LLVM added support for 128-bit enumerators in llvm/llvm-project#125578
That patchset also trusts the constant to describe how wide the variant tag is.
As a result, we went from emitting tags that looked like:
DW_AT_discr_value (0xfe)
(`form1`)
to emitting tags that looked like:
DW_AT_discr_value (<0x10> fe ff ff ff 00 00 00 00 00 00 00 00 00 00 00 00 )
This makes the `DW_AT_discr_value` encode at the bitwidth of the tag,
which:
1. Is probably closer to our intentions in terms of describing the data.
2. Doesn't invoke the 128-bit support which may not be supported by all
debuggers / downstream tools.
3. Will result in smaller debug information.
This should guarantee it tests what we want it to test and no more.
It should probably also run on 64-bit platforms that are not x86-64,
which will often have the vector registers the opt implies.
Pointers for variables all need to be in the same address space for
correct compilation. Therefore ensure that even if an `alloca` is
created in a different address space, it is casted to the default
address space before its value is used.
This is necessary for the amdgpu target and others where the default
address space for `alloca`s is not 0.
For example the following code compiles incorrectly when not casting the
address space to the default one:
```rust
fn f(p: *const i8 /* addrspace(0) */) -> *const i8 /* addrspace(0) */ {
let local = 0i8; /* addrspace(5) */
let res = if cond { p } else { &raw const local };
res
}
```
results in
```llvm
%local = alloca addrspace(5) i8
%res = alloca addrspace(5) ptr
if:
; Store 64-bit flat pointer
store ptr %p, ptr addrspace(5) %res
else:
; Store 32-bit scratch pointer
store ptr addrspace(5) %local, ptr addrspace(5) %res
ret:
; Load and return 64-bit flat pointer
%res.load = load ptr, ptr addrspace(5) %res
ret ptr %res.load
```
For amdgpu, `addrspace(0)` are 64-bit pointers, `addrspace(5)` are
32-bit pointers.
The above code may store a 32-bit pointer and read it back as a 64-bit
pointer, which is obviously wrong and cannot work. Instead, we need to
`addrspacecast %local to ptr addrspace(0)`, then we store and load the
correct type.
adding autodiff tests
I'd like to get started with upstreaming some tests, even though I'm still waiting for an answer on how to best integrate the enzyme pass. Can we therefore temporarily support the -Z llvm-plugins here without too much effort? And in that case, how would that work? I saw you can do remapping, e.g. `rust-src-base`, but I don't think that will give me the path to libEnzyme.so. Do you have another suggestion?
Other than that this test simply checks that the derivative of `x*x` is `2.0 * x`, which in this case is computed as
`%0 = fadd fast double %x.0.val, %x.0.val`
(I'll add a few more tests and move it to an autodiff folder if we can use the -Z flag)
r? ``@jieyouxu``
Locally at least `-Zllvm-plugins=${PWD}/build/x86_64-unknown-linux-gnu/enzyme/build/Enzyme/libEnzyme-19.so` seems to work if I copy the command I get from x.py test and run it manually. However, running x.py test itself fails.
Tracking:
- https://github.com/rust-lang/rust/issues/124509
Zulip discussion: https://rust-lang.zulipchat.com/#narrow/channel/326414-t-infra.2Fbootstrap/topic/Enzyme.20build.20changes
Generate correct terminate block under Wasm EH
This fixes failing LLVM assertions during insnsel.
Improves #135665.
r? bjorn3
^ you reviewed the PR bringing Wasm EH in, I assume this is within your area of expertise?
Debuginfo for function ZSTs should have alignment of 8 bits, not 1 bit
In #116096, function ZSTs were made to have debuginfo that gives them an alignment of “1”. But because alignment in LLVM debuginfo is denoted in *bits*, not bytes, this resulted in an alignment specification of 1 bit instead of 1 byte.
I don't know whether this has any practical consequences, but I noticed that a test started failing when I accidentally fixed the mistake while working on #136632, so I extracted the fix (and the test adjustment) to this PR.
Enable more tests on Windows
As part of the discussion of https://github.com/rust-lang/compiler-team/issues/822 on Zulip, it was mentioned that problems with the i686-pc-windows-gnu target may have resulted in tests being disabled on Windows.
So in this PR, I've ripped out all our `//@ ignore-windows` directives, then re-added all the ones that are definitely required based on the outcome of try-builds, and in some cases I've improved the justification or tightened the directives to `//@ ignore-msvc` or ignoring specific targets.
Fix a couple Emscripten tests
This fixes a couple Emscripten tests where the correct fix is more or less obvious. A couple UI tests are still broken with this PR:
- `tests/ui/abi/numbers-arithmetic/return-float.rs` (#136197)
- `tests/ui/no_std/no-std-unwind-binary.rs` (haven't debugged yet)
- `tests/ui/test-attrs/test-passed.rs` (haven't debugged this either)
`````@rustbot````` label +T-compiler +O-emscripten
optimize slice::ptr_rotate for small rotates
r? `@scottmcm`
This swaps the positions and numberings of algorithms 1 and 2 in `slice::ptr_rotate`, and pulls the entire outer loop into algorithm 3 since it was redundant for the first two. Effectively, `ptr_rotate` now always does the `memcpy`+`memmove`+`memcpy` sequence if the shifts fit into the stack buffer.
With this change, an `IndexMap`-style `move_index` function is optimized correctly.
Assembly comparisons:
- `move_index`, before: https://godbolt.org/z/Kr616KnYM
- `move_index`, after: https://godbolt.org/z/1aoov6j8h
- the code from `#89714`, before: https://godbolt.org/z/Y4zaPxEG6
- the code from `#89714`, after: https://godbolt.org/z/1dPx83axc
related to #89714
some relevant discussion in https://internals.rust-lang.org/t/idea-shift-move-to-efficiently-move-elements-in-a-vec/22184
Behavior tests pass locally. I can't get any consistent microbenchmark results on my machine, but the assembly diffs look promising.
ABI-required target features: warn when they are missing in base CPU
Part of https://github.com/rust-lang/rust/pull/135408:
instead of adding ABI-required features to the target we build for LLVM, check that they are already there. Crucially we check this after applying `-Ctarget-cpu` and `-Ctarget-feature`, by reading `sess.unstable_target_features`. This means we can tweak the ABI target feature check without changing the behavior for any existing user; they will get warnings but the target features behave as before.
The test changes here show that we are un-doing the "add all required target features" part. Without the full #135408, there is no way to take a way an ABI-required target feature with `-Ctarget-cpu`, so we cannot yet test that part.
Cc ``@workingjubilee``
Windows x86: Change i128 to return via the vector ABI
Clang and GCC both return `i128` in xmm0 on windows-msvc and windows-gnu. Currently, Rust returns the type on the stack. Add a calling convention adjustment so we also return scalar `i128`s using the vector ABI, which makes our `i128` compatible with C.
In the future, Clang may change to return `i128` on the stack for its `-msvc` targets (more at [1]). If this happens, the change here will need to be adjusted to only affect MinGW.
Link: https://github.com/rust-lang/rust/issues/134288 (does not fix) [1]
try-job: x86_64-msvc
try-job: x86_64-msvc-ext1
try-job: x86_64-mingw-1
try-job: x86_64-mingw-2
Clang and GCC both return `i128` in xmm0 on windows-msvc and
windows-gnu. Currently, Rust returns the type on the stack. Add a
calling convention adjustment so we also return scalar `i128`s using the
vector ABI, which makes our `i128` compatible with C.
In the future, Clang may change to return `i128` on the stack for its
`-msvc` targets (more at [1]). If this happens, the change here will
need to be adjusted to only affect MinGW.
Link: https://github.com/rust-lang/rust/issues/134288
Currently we both pass and return `i128` indirectly on Windows for MSVC
and MinGW, but this will be adjusted. Introduce a test verifying the
current state.
This improves the codegen for vector `select`, `gather`, `scatter` and
boolean reduction intrinsics and fixesrust-lang/portable-simd#316.
The current behavior of most mask operations during llvm codegen is to
truncate the mask vector to <N x i1>, telling llvm to use the least
significat bit. The exception is the `simd_bitmask` intrinsics, which
already used the most signifiant bit.
Since sse/avx instructions are defined to use the most significant bit,
truncating means that llvm has to insert a left shift to move the bit
into the most significant position, before the mask can actually be
used.
Similarly on aarch64, mask operations like blend work bit by bit,
repeating the least significant bit across the whole lane involves
shifting it into the sign position and then comparing against zero.
By shifting before truncating to <N x i1>, we tell llvm that we only
consider the most significant bit, removing the need for additional
shift instructions in the assembly.
The `Box::new(T::default())` implementation of `Box::default` only
had two stack copies in debug mode, compared to the current version,
which has four. By avoiding creating any `MaybeUninit<T>`'s and just writing
`T` directly to the `Box` pointer, the stack usage in debug mode remains
the same as the old version.
use `PassMode::Direct` for vector types on `s390x`
closes https://github.com/rust-lang/rust/issues/135744
tracking issue: https://github.com/rust-lang/rust/issues/130869
Previously, all vector types were type erased to `Ni8`, now we pass non-wrapped vector types directly. That skips emitting a bunch of casting logic in rustc, that LLVM then has to clean up. The initial LLVM IR is also a bit more readable.
This calling convention is tested extensively in `tests/assembly/s390x-vector-abi.rs`, showing that this change has no impact on the ABI in practice.
r? ````@taiki-e````
Update our range `assume`s to the format that LLVM prefers
I found out in https://github.com/llvm/llvm-project/issues/123278#issuecomment-2597440158 that the way I started emitting the `assume`s in #109993 was suboptimal, and as seen in that LLVM issue the way we're doing it -- with two `assume`s sometimes -- can at times lead to CVP/SCCP not realize what's happening because one of them turns into a `ne` instead of conveying a range.
So this updates how it's emitted from
```
assume( x >= LOW );
assume( x <= HIGH );
```
or
```
// (for ranges that wrap the range)
assume( (x <= LOW) | (x >= HIGH) );
```
to
```
assume( (x - LOW) <= (HIGH - LOW) );
```
so that we don't need multiple `icmp`s nor multiple `assume`s for a single value, and both wrappping and non-wrapping ranges emit the same shape.
(And we don't bother emitting the subtraction if `LOW` is zero, since that's trivial for us to check too.)
remove support for the (unstable) #[start] attribute
As explained by `@Noratrieb:`
`#[start]` should be deleted. It's nothing but an accidentally leaked implementation detail that's a not very useful mix between "portable" entrypoint logic and bad abstraction.
I think the way the stable user-facing entrypoint should work (and works today on stable) is pretty simple:
- `std`-using cross-platform programs should use `fn main()`. the compiler, together with `std`, will then ensure that code ends up at `main` (by having a platform-specific entrypoint that gets directed through `lang_start` in `std` to `main` - but that's just an implementation detail)
- `no_std` platform-specific programs should use `#![no_main]` and define their own platform-specific entrypoint symbol with `#[no_mangle]`, like `main`, `_start`, `WinMain` or `my_embedded_platform_wants_to_start_here`. most of them only support a single platform anyways, and need cfg for the different platform's ways of passing arguments or other things *anyways*
`#[start]` is in a super weird position of being neither of those two. It tries to pretend that it's cross-platform, but its signature is a total lie. Those arguments are just stubbed out to zero on ~~Windows~~ wasm, for example. It also only handles the platform-specific entrypoints for a few platforms that are supported by `std`, like Windows or Unix-likes. `my_embedded_platform_wants_to_start_here` can't use it, and neither could a libc-less Linux program.
So we have an attribute that only works in some cases anyways, that has a signature that's a total lie (and a signature that, as I might want to add, has changed recently, and that I definitely would not be comfortable giving *any* stability guarantees on), and where there's a pretty easy way to get things working without it in the first place.
Note that this feature has **not** been RFCed in the first place.
*This comment was posted [in May](https://github.com/rust-lang/rust/issues/29633#issuecomment-2088596042) and so far nobody spoke up in that issue with a usecase that would require keeping the attribute.*
Closes https://github.com/rust-lang/rust/issues/29633
try-job: x86_64-gnu-nopt
try-job: x86_64-msvc-1
try-job: x86_64-msvc-2
try-job: test-various