Replace the \01__gnu_mcount_nc to LLVM intrinsic for additional ARM targets
This is an extension to #113814 which seems to have missed two targets which also need this patch for instrumentation with `-Z instrument-mcount` to work correctly.
For anyone who might stumble over this issue again in the future: As a workaround one can dump the current target configuration using
```
rustc +nightly -Z unstable-options --target armv7-unknown-linux-gnueabihf --print target-spec-json
```
(assuming `armv7-unknown-linux-gnueabihf` is the target to build for) add the line
```
"llvm-mcount-intrinsic": "llvm.arm.gnu.eabi.mcount",
```
and compile with
```
RUSTFLAGS="-Z instrument-mcount -C passes=ee-instrument<post-inline>" cargo +nightly build -Z build-std --target <path to directory with modified target config>/armv7-unknown-linux-gnueabihf.json
```
It might be necessary to set the compiler for cross compiling using something like
```
export TARGET_CC=arm-linux-gnueabihf-gcc
```
PassWrapper: adapt for llvm/llvm-project@d3d856ad84
LLVM 21 moves to making it more explicit what this function call is doing, but nothing has changed behaviorally, so for now we just adjust to using the new name of the function.
`@rustbot` label llvm-main
Enable `reliable_f16_math` on x86
This has been disabled due to an LLVM misoptimization with `powi.f16` [1]. This was fixed upstream and the fix is included in LLVM20, so tests no longer need to be disabled.
`f16` still remains disabled on MinGW due to the ABI issue.
[1]: https://github.com/llvm/llvm-project/issues/98665
try-job: x86_64-gnu
try-job: x86_64-gnu-llvm-19-1
try-job: x86_64-gnu-llvm-20-1
While profiling Zed's dev build I've noticed that while most of the time `upstream_monomorphizations` takes a lot of time in monomorpization_collector, in some cases (e.g. build of `editor` itself)
the rest of monomorphization_collector_graph_walk dominates it. Most of the time is spent in collect_items_rec.
This PR aims to reduce the number of locks taking place; instead of locking output MonoItems once per children of current node, we do so once per *parent*. We also get to reuse locks for mentioned and used items.
While this commit does not reduce Wall time of Zed's build, it does shave off `cargo build -j1` from 43s to 41.5s.
This has been disabled due to an LLVM misoptimization with `powi.f16`
[1]. This was fixed upstream and the fix is included in LLVM20, so tests
no longer need to be disabled.
`f16` still remains disabled on MinGW due to the ABI issue.
[1]: https://github.com/llvm/llvm-project/issues/98665
Extend the alignment check to borrows
The current alignment check does not include checks for creating misaligned references from raw pointers, which is now added in this patch.
When inserting the check we need to be careful with references to field projections (e.g. `&(*ptr).a`), in which case the resulting reference must be aligned according to the field type and not the type of the pointer.
r? `@saethlin`
cc `@RalfJung,` after our discussion in #134424
Most notably, the `FIXME` for suboptimal printing of `use` groups in
`tests/ui/macros/stringify.rs` is fixed. And all other test output
changes result in pretty printed output being closer to the original
formatting in the source code.
Specifically: `TokenCursor`, `TokenTreeCursor`,
`LazyAttrTokenStreamImpl`, `FlatToken`, `make_attr_token_stream`,
`ParserRange`, `NodeRange`. `ParserReplacement`, and `NodeReplacement`.
These are all related to token streams, rather than actual parsing.
This will facilitate the simplifications in the next commit.
only return nested goals for `Certainty::Yes`
Ambiguous `NormalizesTo` goals can otherwise repeatedly add the same nested goals to the parent.
r? ```@compiler-errors```
Implement the internal feature `cfg_target_has_reliable_f16_f128`
Support for `f16` and `f128` is varied across targets, backends, and backend versions. Eventually we would like to reach a point where all backends support these approximately equally, but until then we have to work around some of these nuances of support being observable.
Introduce the `cfg_target_has_reliable_f16_f128` internal feature, which provides the following new configuration gates:
* `cfg(target_has_reliable_f16)`
* `cfg(target_has_reliable_f16_math)`
* `cfg(target_has_reliable_f128)`
* `cfg(target_has_reliable_f128_math)`
`reliable_f16` and `reliable_f128` indicate that basic arithmetic for the type works correctly. The `_math` versions indicate that anything relying on `libm` works correctly, since sometimes this hits a separate class of codegen bugs.
These options match configuration set by the build script at [1]. The logic for LLVM support is duplicated as-is from the same script. There are a few possible updates that will come as a follow up.
The config introduced here is not planned to ever become stable, it is only intended to replace the build scripts for `std` tests and `compiler-builtins` that don't have any way to configure based on the codegen backend.
MCP: https://github.com/rust-lang/compiler-team/issues/866
Closes: https://github.com/rust-lang/compiler-team/issues/866
[1]: 555e1d0386/library/std/build.rs (L84-L186)
---
The second commit makes use of this config to replace `cfg_{f16,f128}{,_math}` in `library/`. I omitted providing a `cfg(bootstrap)` configuration to keep things simpler since the next beta branch is in two weeks.
try-job: aarch64-gnu
try-job: i686-msvc-1
try-job: test-various
try-job: x86_64-gnu
try-job: x86_64-msvc-ext2
Move inline asm check to typeck, properly handle aliases
Pull `InlineAsmCtxt` down to `rustc_hir_typeck`, and instead of using things like `Ty::is_copy`, use the `InferCtxt`-aware methods. To fix https://github.com/rust-lang/trait-system-refactor-initiative/issues/189, we also add a `try_structurally_resolve_*` call to `expr_ty`.
r? lcnr
Do not compute type_of for impl item if impl where clauses are unsatisfied
Consider the following code:
```rust
trait Foo {
fn call(self) -> impl Send;
}
trait Nested {}
impl<T> Foo for T
where
T: Nested,
{
fn call(self) -> impl Sized {
NotSatisfied.call()
}
}
struct NotSatisfied;
impl Foo for NotSatisfied {
fn call(self) -> impl Sized {
todo!()
}
}
```
In `impl Foo for NotSatisfied`, we need to prove that the RPITIT is well formed. This requires proving the item bound `<NotSatisfied as Foo>::RPITIT: Send`. Normalizing `<NotSatisfied as Foo>::RPITIT: Send` assembles two impl candidates, via the `NotSatisfied` impl and the blanket `T` impl. We end up computing the `type_of` for the blanket impl even if `NotSatisfied: Nested` where clause does not hold.
This type_of query ends up needing to prove that its own `impl Sized` RPIT satisfies `Send`, which ends up needing to compute the hidden type of the RPIT, which is equal to the return type of `NotSatisfied.call()`. That ends up in a query cycle, since we subsequently try normalizing that return type via the blanket impl again!
In the old solver, we don't end up computing the `type_of` an impl candidate if its where clauses don't hold, since this select call would fail before confirming the projection candidate:
d7ea436a02/compiler/rustc_trait_selection/src/traits/project.rs (L882)
This PR makes the new solver more consistent with the old solver by adding a call to `try_evaluate_added_goals` after regstering the impl predicates, which causes us to bail before computing the `type_of` for impls if the impl definitely doesn't apply.
r? lcnr
Fixes https://github.com/rust-lang/trait-system-refactor-initiative/issues/185
allow deref patterns to move out of boxes
This adds a case to lower deref patterns on boxes using a built-in deref instead of a `Deref::deref` or `DerefMut::deref_mut` call: if `deref!(inner): Box<T>` is matching on place `place`, the inner pattern `inner` now matches on `*place` rather than a temporary. No longer needing to call a method also means it won't borrow the scrutinee in match arms. This allows for bindings in `inner` to move out of `*place`.
For comparison with box patterns, this uses the same MIR lowering but different THIR. Consequently, deref patterns on boxes are treated the same as any other deref patterns in match exhaustiveness analysis. Box patterns can't quite be implemented in terms of deref patterns until exhaustiveness checking for deref patterns is implemented (I'll open a PR for exhaustiveness soon!).
Tracking issue: #87121
r? ``@Nadrieril``
In Zed shared-generics loading takes up a significant chunk of time in incremental build, as rustc deserializes rmeta of all dependencies of a crate.
I've recently realized that shared-generics includes all instantiations of some_generic_function in the following snippet:
```rs
pub fn some_generic_function(_: impl Fn()) {}
pub fn non_generic_function() {
some_generic_function(|| {});
some_generic_function(|| {});
some_generic_function(|| {});
some_generic_function(|| {});
some_generic_function(|| {});
some_generic_function(|| {});
some_generic_function(|| {});
}
```
even though none of these instantiations can actually be created from outside of `non_generic_function`.
This PR makes shared-generics account for visibilities of generic arguments; an item is only considered for exporting if it is reachable from the outside or if all of it's arguments are visible outside of the local crate.
This PR reduces incremental build time for Zed (touch edito.rs scenario) from 12.4s to 10.4s.
Co-authored-by: bjorn3 <17426603+bjorn3@users.noreply.github.com>
Rollup of 7 pull requests
Successful merges:
- #140056 (Fix a wrong error message in 2024 edition)
- #140220 (Fix detection of main function if there are expressions around it)
- #140249 (Remove `weak` alias terminology)
- #140316 (Introduce `BoxMarker` to improve pretty-printing correctness)
- #140347 (ci: clean more disk space in codebuild)
- #140349 (ci: use aws codebuild for the `dist-x86_64-linux` job)
- #140379 (rustc-dev-guide subtree update)
r? `@ghost`
`@rustbot` modify labels: rollup
LLVM 21 moves to making it more explicit what this function call is
doing, but nothing has changed behaviorally, so for now we just adjust
to using the new name of the function.
@rustbot label llvm-main
Async drop codegen
Async drop implementation using templated coroutine for async drop glue generation.
Scopes changes to generate `async_drop_in_place()` awaits, when async droppable objects are out-of-scope in async context.
Implementation details:
https://github.com/azhogin/posts/blob/main/async-drop-impl.md
New fields in Drop terminator (drop & async_fut). Processing in codegen/miri must validate that those fields are empty (in full version async Drop terminator will be expanded at StateTransform pass or reverted to sync version). Changes in terminator visiting to consider possible new successor (drop field).
ResumedAfterDrop messages for panic when coroutine is resumed after it is started to be async drop'ed.
Lang item for generated coroutine for async function async_drop_in_place. `async fn async_drop_in_place<T>()::{{closure0}}`.
Scopes processing for generate async drop preparations. Async drop is a hidden Yield, so potentially async drops require the same dropline preparation as for Yield terminators.
Processing in StateTransform: async drops are expanded into yield-point. Generation of async drop of coroutine itself added.
Shims for AsyncDropGlueCtorShim, AsyncDropGlue and FutureDropPoll.
```rust
#[lang = "async_drop"]
pub trait AsyncDrop {
#[allow(async_fn_in_trait)]
async fn drop(self: Pin<&mut Self>);
}
impl Drop for Foo {
fn drop(&mut self) {
println!("Foo::drop({})", self.my_resource_handle);
}
}
impl AsyncDrop for Foo {
async fn drop(self: Pin<&mut Self>) {
println!("Foo::async drop({})", self.my_resource_handle);
}
}
```
First async drop glue implementation re-worked to use the same drop elaboration code as for sync drop.
`async_drop_in_place` changed to be `async fn`. So both `async_drop_in_place` ctor and produced coroutine have their lang items (`AsyncDropInPlace`/`AsyncDropInPlacePoll`) and shim instances (`AsyncDropGlueCtorShim`/`AsyncDropGlue`).
```
pub async unsafe fn async_drop_in_place<T: ?Sized>(_to_drop: *mut T) {
}
```
AsyncDropGlue shim generation uses `elaborate_drops::elaborate_drop` to produce drop ladder (in the similar way as for sync drop glue) and then `coroutine::StateTransform` to convert function into coroutine poll.
AsyncDropGlue coroutine's layout can't be calculated for generic T, it requires known final dropee type to be generated (in StateTransform). So, `templated coroutine` was introduced here (`templated_coroutine_layout(...)` etc).
Such approach overrides the first implementation using mixing language-level futures in https://github.com/rust-lang/rust/pull/121801.