coverage: Detect unused local file IDs to avoid an LLVM assertion
Each function's coverage metadata contains a *local file table* that maps local file IDs (used by the function's mapping regions) to global file IDs (shared by all functions in the same CGU).
LLVM requires all local file IDs to have at least one mapping region, and has an assertion that will fail if it detects a local file ID with no regions. To make sure that assertion doesn't fire, we need to detect and skip functions whose metadata would trigger it.
(This can't actually happen yet, because currently all of a function's spans must belong to the same file and expansion. But this will be an important edge case when adding expansion region support.)
This case can't actually happen yet (other than via a testing flag), because
currently all of a function's spans must belong to the same file and expansion.
But this will be an important edge case when adding expansion region support.
Previously `-Zprint-mono-items` would override the mono item collection
strategy. When debugging one doesn't want to change the behaviour, so
this was counter productive. Additionally, the produced behaviour was
artificial and might never arise without using the option in the first
place (`-Zprint-mono-items=eager` without `-Clink-dead-code`). Finally,
the option was incorrectly marked as `UNTRACKED`.
Resolve those issues, by turning `-Zprint-mono-items` into a boolean
flag that prints results of mono item collection without changing the
behaviour of mono item collection.
For codegen-units test incorporate `-Zprint-mono-items` flag directly
into compiletest tool.
Test changes are mechanical. `-Zprint-mono-items=lazy` was removed
without additional changes, and `-Zprint-mono-items=eager` was turned
into `-Clink-dead-code`. Linking dead code disables internalization, so
tests have been updated accordingly.
Make `-Zfixed-x18` into a target modifier
As part of #136966, the `-Zfixed-x18` flag should be turned into a target modifier. This is a blocker to stabilization of the flag. The flag was originally added in #124655 and the MCP for its addition is [MCP#748](https://github.com/rust-lang/compiler-team/issues/748).
On some aarch64 targets, the x18 register is used as a temporary caller-saved register by default. When the `-Zfixed-x18` flag is passed, this is turned off so that the compiler doesn't use the x18 register. This allows end-users to use the x18 register for other purposes. For example, by accessing it with inline asm you can use the register as a very efficient thread-local variable. Another common use-case is to store the stack pointer needed by the shadow-call-stack sanitizer. There are also some aarch64 targets where not using x18 is the default – in those cases the flag is a no-op.
Note that this flag does not *on its own* cause an ABI mismatch. What actually causes an ABI mismatch is when you have different compilation units that *disagree* on what it should be used for. But having a CU that uses it and another CU that doesn't normally isn't enough to trigger an ABI problem. However, we still consider the flag to be a target modifier in all cases, since it is assumed that you are passing the flag because you intend to assign some other meaning to the register. Rejecting all flag mismatches even if not all are unsound is consistent with [RFC#3716](https://rust-lang.github.io/rfcs/3716-target-modifiers.html). See the headings "not all mismatches are unsound" and "cases that are not caught" for additional discussion of this.
On aarch64 targets where `-Zfixed-x18` is not a no-op, it is an error to pass `-Zsanitizer=shadow-call-stack` without also passing `-Zfixed-x18`.
Autodiff flags
Interestingly, it seems that some other projects have conflicts with exactly the same LLVM optimization passes as autodiff.
At least `LLVMRustOptimize` has exactly the flags that we need to disable problematic opt passes.
This PR enables us to compile code where users differentiate two identical functions in the same module. This has been especially common in test cases, but it's not impossible to encounter in the wild.
It also enables two new flags for testing/debugging. I consider writing an MCP to upgrade PrintPasses to be a standalone -Z flag, since it is *not* the same as `-Z print-llvm-passes`, which IMHO gives less useful output. A discussion can be found here: [#t-compiler/llvm > Print llvm passes. @ 💬](https://rust-lang.zulipchat.com/#narrow/channel/187780-t-compiler.2Fllvm/topic/Print.20llvm.20passes.2E/near/511533038)
Finally, it improves `PrintModBefore` and `PrintModAfter`. They used to work reliable, but now we just schedule enzyme as part of an existing ModulePassManager (MPM). Since Enzyme is last in the MPM scheduling, PrintModBefore became very inaccurate. It used to print the input module, which we gave to the Enzyme and was great to create llvm-ir reproducer. However, lately the MPM would run the whole `default<O3>` pipeline, which heavily modifies the llvm module, before we pass it to Enzyme. That made it impossible to use the flag to create llvm-ir reproducers for Enzyme bugs. We now schedule a PrintModule pass just before Enzyme, solving this problem.
Based on the PrintPass output, it also _seems_ like changing `registerEnzymeAndPassPipeline(PB, true);` to `registerEnzymeAndPassPipeline(PB, false);` has no effect. In theory, the bool should tell Enzyme to schedule some helpful passes in the PassBuilder. However, since it doesn't do anything and I'm not 100% sure anymore on whether we really need it, I'll just disable it for now and postpone investigations.
r? ``@oli-obk``
closes#139471
Tracking:
- https://github.com/rust-lang/rust/issues/124509
Do not remove trivial `SwitchInt` in analysis MIR
This PR ensures that we don't prematurely remove trivial `SwitchInt` terminators which affects both the borrow-checking and runtime semantics (i.e. UB) of the code. Previously the `SimplifyCfg` optimization was removing `SwitchInt` terminators when they was "trivial", i.e. when all arms branched to the same basic block, even if that `SwitchInt` terminator had the side-effect of reading an operand which (for example) may not be initialized or may point to an invalid place in memory.
This behavior is unlike all other optimizations, which are only applied after "analysis" (i.e. borrow-checking) is finished, and which Miri disables to make sure the compiler doesn't silently remove UB.
Fixing this code "breaks" (i.e. unmasks) code that used to borrow-check but no longer does, like:
```rust
fn foo() {
let x;
let (0 | _) = x;
}
```
This match expression should perform a read because `_` does not shadow the `0` literal pattern, and the compiler should have to read the match scrutinee to compare it to 0. I've checked that this behavior does not actually manifest in practice via a crater run which came back clean: https://github.com/rust-lang/rust/pull/139042#issuecomment-2767436367
As a side-note, it may be tempting to suggest that this is actually a good thing or that we should preserve this behavior. If we wanted to make this work (i.e. trivially optimize out reads from matches that are redundant like `0 | _`), then we should be enabling this behavior *after* fixing this. However, I think it's kinda unprincipled, and for example other variations of the code don't even work today, e.g.:
```rust
fn foo() {
let x;
let (0.. | _) = x;
}
```
Add unstable parsing of `--extern foo::bar=libbar.rlib` command line options
This is a tiny step towards implementing the rustc side of support for implementing packages as optional namespaces (#122349). We add support for parsing command line options like `--extern foo::bar=libbar.rlib` when the `-Z namespaced-crates` option is present.
We don't do anything further with them. The next step is to plumb this down to the name resolver.
This PR also generally refactors the extern argument parsing code and adds some unit tests to make it clear what forms should be accepted with and without the flag.
cc ```@epage``` ```@ehuss```
Autodiff batching
Enzyme supports batching, which is especially known from the ML side when training neural networks.
There we would normally have a training loop, where in each iteration we would pass in some data (e.g. an image), and a target vector. Based on how close we are with our prediction we compute our loss, and then use backpropagation to compute the gradients and update our weights.
That's quite inefficient, so what you normally do is passing in a batch of 8/16/.. images and targets, and compute the gradients for those all at once, allowing better optimizations.
Enzyme supports batching in two ways, the first one (which I implemented here) just accepts a Batch size,
and then each Dual/Duplicated argument has not one, but N shadow arguments. So instead of
```rs
for i in 0..100 {
df(x[i], y[i], 1234);
}
```
You can now do
```rs
for i in 0..100.step_by(4) {
df(x[i+0],x[i+1],x[i+2],x[i+3], y[i+0], y[i+1], y[i+2], y[i+3], 1234);
}
```
which will give the same results, but allows better compiler optimizations. See the testcase for details.
There is a second variant, where we can mark certain arguments and instead of having to pass in N shadow arguments, Enzyme assumes that the argument is N times longer. I.e. instead of accepting 4 slices with 12 floats each, we would accept one slice with 48 floats. I'll implement this over the next days.
I will also add more tests for both modes.
For any one preferring some more interactive explanation, here's a video of Tim's llvm dev talk, where he presents his work. https://www.youtube.com/watch?v=edvaLAL5RqU
I'll also add some other docs to the dev guide and user docs in another PR.
r? ghost
Tracking:
- https://github.com/rust-lang/rust/issues/124509
- https://github.com/rust-lang/rust/issues/135283
add `TypingMode::Borrowck`
Shares the first commit with #138499, doesn't really matter which PR to land first 😊😁
Introduces `TypingMode::Borrowck` which unlike `TypingMode::Analysis`, uses the hidden type computed by HIR typeck as the initial value of opaques instead of an unconstrained infer var. This is a part of https://github.com/rust-lang/types-team/issues/129.
Using this new `TypingMode` is unfortunately a breaking change for now, see tests/ui/impl-trait/non-defining-uses/as-projection-term.rs. Using an inference variable as the initial value results in non-defining uses in the defining scope. We therefore only enable it if with `-Znext-solver=globally` or `-Ztyping-mode-borrowck`
To do that the PR contains the following changes:
- `TypeckResults::concrete_opaque_type` are already mapped to the definition of the opaque type
- writeback now checks that the non-lifetime parameters of the opaque are universal
- for this, `fn check_opaque_type_parameter_valid` is moved from `rustc_borrowck` to `rustc_trait_selection`
- we add a new `query type_of_opaque_hir_typeck` which, using the same visitors as MIR typeck, attempts to merge the hidden types from HIR typeck from all defining scopes
- done by adding a `DefiningScopeKind` flag to toggle between using borrowck and HIR typeck
- the visitors stop checking that the MIR type matches the HIR type. This is trivial as the HIR type are now used as the initial hidden types of the opaque. This check is useful as a safeguard when not using `TypingMode::Borrowck`, but adding it to the new structure is annoying and it's not soundness critical, so I intend to not add it back.
- add a `TypingMode::Borrowck` which behaves just like `TypingMode::Analysis` except when normalizing opaque types
- it uses `type_of_opaque_hir_typeck(opaque)` as the initial value after replacing its regions with new inference vars
- it uses structural lookup in the new solver
fixes#112201, fixes#132335, fixes#137751
r? `@compiler-errors` `@oli-obk`
Target modifiers fix for bool flags without value
Fixed support of boolean flags without values: `-Zbool-flag` is now consistent with `-Zbool-flag=true` in another crate.
When flag is explicitly set to default value, target modifier will not be set in crate metainfo (`-Zflag=false` when `false` is a default value for the flag).
Improved error notification when target modifier flag is absent in a crate ("-Zflag unset").
Example:
```
note: `-Zreg-struct-return=true` in this crate is incompatible with unset `-Zreg-struct-return` in dependency `default_reg_struct_return`
```
add FCW to warn about wasm ABI transition
See https://github.com/rust-lang/rust/issues/122532 for context: the "C" ABI on wasm32-unk-unk will change. The goal of this lint is to warn about any function definition and calls whose behavior will be affected by the change. My understanding is the following:
- scalar arguments are fine
- including 128 bit types, they get passed as two `i64` arguments in both ABIs
- `repr(C)` structs (recursively) wrapping a single scalar argument are fine (unless they have extra padding due to over-alignment attributes)
- all return values are fine
`@bjorn3` `@alexcrichton` `@Manishearth` is that correct?
I am making this a "show up in future compat reports" lint to maximize the chances people become aware of this. OTOH this likely means warnings for most users of Diplomat so maybe we shouldn't do this?
IIUC, wasm-bindgen should be unaffected by this lint as they only pass scalar types as arguments.
Tracking issue: https://github.com/rust-lang/rust/issues/138762
Transition plan blog post: https://github.com/rust-lang/blog.rust-lang.org/pull/1531
try-job: dist-various-2
Represent diagnostic side effects as dep nodes
This changes diagnostic to be tracked as a special dep node (`SideEffect`) instead of having a list of side effects associated with each dep node. `SideEffect` is always red and when forced, it emits the diagnostic and marks itself green. Each emitted diagnostic generates a new `SideEffect` with an unique dep node index.
Some implications of this:
- Diagnostic may now be emitted more than once as they can be emitted once when the `SideEffect` gets marked green and again if the task it depends on needs to be re-executed due to another node being red. It relies on deduplicating of diagnostics to avoid that.
- Anon tasks which emits diagnostics will no longer *incorrectly* be merged with other anon tasks.
- Reusing a CGU will now emit diagnostics from the task generating it.
Target modifiers (special marked options) are recorded in metainfo
Target modifiers (special marked options) are recorded in metainfo and compared to be equal in different linked crates.
PR for this RFC: https://github.com/rust-lang/rfcs/pull/3716
Option may be marked as `TARGET_MODIFIER`, example: `regparm: Option<u32> = (None, parse_opt_number, [TRACKED TARGET_MODIFIER]`.
If an TARGET_MODIFIER-marked option has non-default value, it will be recorded in crate metainfo as a `Vec<TargetModifier>`:
```
pub struct TargetModifier {
pub opt: OptionsTargetModifiers,
pub value_name: String,
}
```
OptionsTargetModifiers is a macro-generated enum.
Option value code (for comparison) is generated using `Debug` trait.
Error example:
```
error: mixing `-Zregparm` will cause an ABI mismatch in crate `incompatible_regparm`
--> $DIR/incompatible_regparm.rs:10:1
|
LL | #![crate_type = "lib"]
| ^
|
= help: the `-Zregparm` flag modifies the ABI so Rust crates compiled with different values of this flag cannot be used together safely
= note: `-Zregparm=1` in this crate is incompatible with `-Zregparm=2` in dependency `wrong_regparm`
= help: set `-Zregparm=2` in this crate or `-Zregparm=1` in `wrong_regparm`
= help: if you are sure this will not cause problems, use `-Cunsafe-allow-abi-mismatch=regparm` to silence this error
error: aborting due to 1 previous error
```
`-Cunsafe-allow-abi-mismatch=regparm,reg-struct-return` to disable list of flags.
Autodiff Upstreaming - rustc_codegen_ssa, rustc_middle
This PR should not be merged until the rustc_codegen_llvm part is merged.
I will also alter it a little based on what get's shaved off from the cg_llvm PR,
and address some of the feedback I received in the other PR (including cleanups).
I am putting it already up to
1) Discuss with `@jieyouxu` if there is more work needed to add tests to this and
2) Pray that there is someone reviewing who can tell me why some of my autodiff invocations get lost.
Re 1: My test require fat-lto. I also modify the compilation pipeline. So if there are any other llvm-ir tests in the same compilation unit then I will likely break them. Luckily there are two groups who currently have the same fat-lto requirement for their GPU code which I have for my autodiff code and both groups have some plans to enable support for thin-lto. Once either that work pans out, I'll copy it over for this feature. I will also work on not changing the optimization pipeline for functions not differentiated, but that will require some thoughts and engineering, so I think it would be good to be able to run the autodiff tests isolated from the rest for now. Can you guide me here please?
For context, here are some of my tests in the samples folder: https://github.com/EnzymeAD/rustbook
Re 2: This is a pretty serious issue, since it effectively prevents publishing libraries making use of autodiff: https://github.com/EnzymeAD/rust/issues/173. For some reason my dummy code persists till the end, so the code which calls autodiff, deletes the dummy, and inserts the code to compute the derivative never gets executed. To me it looks like the rustc_autodiff attribute just get's dropped, but I don't know WHY? Any help would be super appreciated, as rustc queries look a bit voodoo to me.
Tracking:
- https://github.com/rust-lang/rust/issues/124509
r? `@jieyouxu`
Fix deduplication mismatches in vtables leading to upcasting unsoundness
We currently have two cases where subtleties in supertraits can trigger disagreements in the vtable layout, e.g. leading to a different vtable layout being accessed at a callsite compared to what was prepared during unsizing. Namely:
### #135315
In this example, we were not normalizing supertraits when preparing vtables. In the example,
```
trait Supertrait<T> {
fn _print_numbers(&self, mem: &[usize; 100]) {
println!("{mem:?}");
}
}
impl<T> Supertrait<T> for () {}
trait Identity {
type Selff;
}
impl<Selff> Identity for Selff {
type Selff = Selff;
}
trait Middle<T>: Supertrait<()> + Supertrait<T> {
fn say_hello(&self, _: &usize) {
println!("Hello!");
}
}
impl<T> Middle<T> for () {}
trait Trait: Middle<<() as Identity>::Selff> {}
impl Trait for () {}
fn main() {
(&() as &dyn Trait as &dyn Middle<()>).say_hello(&0);
}
```
When we prepare `dyn Trait`, we see a supertrait of `Middle<<() as Identity>::Selff>`, which itself has two supertraits `Supertrait<()>` and `Supertrait<<() as Identity>::Selff>`. These two supertraits are identical, but they are not duplicated because we were using structural equality and *not* considering normalization. This leads to a vtable layout with two trait pointers.
When we upcast to `dyn Middle<()>`, those two supertraits are now the same, leading to a vtable layout with only one trait pointer. This leads to an offset error, and we call the wrong method.
### #135316
This one is a bit more interesting, and is the bulk of the changes in this PR. It's a bit similar, except it uses binder equality instead of normalization to make the compiler get confused about two vtable layouts. In the example,
```
trait Supertrait<T> {
fn _print_numbers(&self, mem: &[usize; 100]) {
println!("{mem:?}");
}
}
impl<T> Supertrait<T> for () {}
trait Trait<T, U>: Supertrait<T> + Supertrait<U> {
fn say_hello(&self, _: &usize) {
println!("Hello!");
}
}
impl<T, U> Trait<T, U> for () {}
fn main() {
(&() as &'static dyn for<'a> Trait<&'static (), &'a ()>
as &'static dyn Trait<&'static (), &'static ()>)
.say_hello(&0);
}
```
When we prepare the vtable for `dyn for<'a> Trait<&'static (), &'a ()>`, we currently consider the PolyTraitRef of the vtable as the key for a supertrait. This leads two two supertraits -- `Supertrait<&'static ()>` and `for<'a> Supertrait<&'a ()>`.
However, we can upcast[^up] without offsetting the vtable from `dyn for<'a> Trait<&'static (), &'a ()>` to `dyn Trait<&'static (), &'static ()>`. This is just instantiating the principal trait ref for a specific `'a = 'static`. However, when considering those supertraits, we now have only one distinct supertrait -- `Supertrait<&'static ()>` (which is deduplicated since there are two supertraits with the same substitutions). This leads to similar offsetting issues, leading to the wrong method being called.
[^up]: I say upcast but this is a cast that is allowed on stable, since it's not changing the vtable at all, just instantiating the binder of the principal trait ref for some lifetime.
The solution here is to recognize that a vtable isn't really meaningfully higher ranked, and to just treat a vtable as corresponding to a `TraitRef` so we can do this deduplication more faithfully. That is to say, the vtable for `dyn for<'a> Tr<'a>` and `dyn Tr<'x>` are always identical, since they both would correspond to a set of free regions on an impl... Do note that `Tr<for<'a> fn(&'a ())>` and `Tr<fn(&'static ())>` are still distinct.
----
There's a bit more that can be cleaned up. In codegen, we can stop using `PolyExistentialTraitRef` basically everywhere. We can also fix SMIR to stop storing `PolyExistentialTraitRef` in its vtable allocations.
As for testing, it's difficult to actually turn this into something that can be tested with `rustc_dump_vtable`, since having multiple supertraits that are identical is a recipe for ambiguity errors. Maybe someone else is more creative with getting that attr to work, since the tests I added being run-pass tests is a bit unsatisfying. Miri also doesn't help here, since it doesn't really generate vtables that are offset by an index in the same way as codegen.
r? `@lcnr` for the vibe check? Or reassign, idk. Maybe let's talk about whether this makes sense.
<sup>(I guess an alternative would also be to not do any deduplication of vtable supertraits (or only a really conservative subset) rather than trying to normalize and deduplicate more faithfully here. Not sure if that works and is sufficient tho.)</sup>
cc `@steffahn` -- ty for the minimizations
cc `@WaffleLapkin` -- since you're overseeing the feature stabilization :3
Fixes#135315Fixes#135316
add `-Zmin-function-alignment`
tracking issue: https://github.com/rust-lang/rust/issues/82232
This PR adds the `-Zmin-function-alignment=<align>` flag, that specifies a minimum alignment for all* functions.
### Motivation
This feature is requested by RfL [here](https://github.com/rust-lang/rust/issues/128830):
> i.e. the equivalents of `-fmin-function-alignment` ([GCC](https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-fmin-function-alignment_003dn), Clang does not support it) / `-falign-functions` ([GCC](https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-falign-functions), [Clang](https://clang.llvm.org/docs/ClangCommandLineReference.html#cmdoption-clang1-falign-functions)).
>
> For the Linux kernel, the behavior wanted is that of GCC's `-fmin-function-alignment` and Clang's `-falign-functions`, i.e. align all functions, including cold functions.
>
> There is [`feature(fn_align)`](https://github.com/rust-lang/rust/issues/82232), but we need to do it globally.
### Behavior
The `fn_align` feature does not have an RFC. It was decided at the time that it would not be necessary, but maybe we feel differently about that now? In any case, here are the semantics of this flag:
- `-Zmin-function-alignment=<align>` specifies the minimum alignment of all* functions
- the `#[repr(align(<align>))]` attribute can be used to override the function alignment on a per-function basis: when `-Zmin-function-alignment` is specified, the attribute's value is only used when it is higher than the value passed to `-Zmin-function-alignment`.
- the target may decide to use a higher value (e.g. on x86_64 the minimum that LLVM generates is 16)
- The highest supported alignment in rust is `2^29`: I checked a bunch of targets, and they all emit the `.p2align 29` directive for targets that align functions at all (some GPU stuff does not have function alignment).
*: Only with `build-std` would the minimum alignment also be applied to `std` functions.
---
cc `@ojeda`
r? `@workingjubilee` you were active on the tracking issue
mark deprecated option as deprecated in rustc_session to remove copypasta and small refactor
This marks deprecated options as deprecated via flag in options table in rustc_session, which removes copypasted deprecation text from rustc_driver_impl.
This also adds warning for deprecated `-C ar` option, which didn't emitted any warnings before.
Makes `inline_threshold` `[UNTRACKED]`, as it do nothing.
Adds few tests.
See individual commits.
inline_threshold mark deprecated
no-stack-check
print deprecation message for -Car too
inline_threshold deprecated and do nothing: make in untracked
make OptionDesc struct from tuple