Adjusted LLVM codegen for code compiled with `-Zinstrument-coverage` to
address multiple, somewhat related issues.
Fixed a significant flaw in prior coverage solution: Every counter
generated a new counter variable, but there should have only been one
counter variable per function. This appears to have bloated .profraw
files significantly. (For a small program, it increased the size by
about 40%. I have not tested large programs, but there is anecdotal
evidence that profraw files were way too large. This is a good fix,
regardless, but hopefully it also addresses related issues.
Fixes: #82144
Invalid LLVM coverage data produced when compiled with -C opt-level=1
Existing tests now work up to at least `opt-level=3`. This required a
detailed analysis of the LLVM IR, comparisons with Clang C++ LLVM IR
when compiled with coverage, and a lot of trial and error with codegen
adjustments.
The biggest hurdle was figuring out how to continue to support coverage
results for unused functions and generics. Rust's coverage results have
three advantages over Clang's coverage results:
1. Rust's coverage map does not include any overlapping code regions,
making coverage counting unambiguous.
2. Rust generates coverage results (showing zero counts) for all unused
functions, including generics. (Clang does not generate coverage for
uninstantiated template functions.)
3. Rust's unused functions produce minimal stubbed functions in LLVM IR,
sufficient for including in the coverage results; while Clang must
generate the complete LLVM IR for each unused function, even though
it will never be called.
This PR removes the previous hack of attempting to inject coverage into
some other existing function instance, and generates dedicated instances
for each unused function. This change, and a few other adjustments
(similar to what is required for `-C link-dead-code`, but with lower
impact), makes it possible to support LLVM optimizations.
Fixes: #79651
Coverage report: "Unexecuted instantiation:..." for a generic function
from multiple crates
Fixed by removing the aforementioned hack. Some "Unexecuted
instantiation" notices are unavoidable, as explained in the
`used_crate.rs` test, but `-Zinstrument-coverage` has new options to
back off support for either unused generics, or all unused functions,
which avoids the notice, at the cost of less coverage of unused
functions.
Fixes: #82875
Invalid LLVM coverage data produced with crate brotli_decompressor
Fixed by disabling the LLVM function attribute that forces inlining, if
`-Z instrument-coverage` is enabled. This attribute is applied to
Rust functions with `#[inline(always)], and in some cases, the forced
inlining breaks coverage instrumentation and reports.
Set codegen thread names
Set names on threads spawned during codegen. Various debugging and profiling tools can take advantage of this to show a more useful identifier for threads.
For example, gdb will show thread names in `info threads`:
```
(gdb) info threads
Id Target Id Frame
1 Thread 0x7fffefa7ec40 (LWP 2905) "rustc" __pthread_clockjoin_ex (threadid=140737214134016, thread_return=0x0, clockid=<optimized out>, abstime=<optimized out>, block=<optimized out>)
at pthread_join_common.c:145
2 Thread 0x7fffefa7b700 (LWP 2957) "rustc" 0x00007ffff125eaa8 in llvm::X86_MC::initLLVMToSEHAndCVRegMapping(llvm::MCRegisterInfo*) ()
from /home/wesley/.rustup/toolchains/stage1/lib/librustc_driver-f866439e29074957.so
3 Thread 0x7fffeef0f700 (LWP 3116) "rustc" futex_wait_cancelable (private=0, expected=0, futex_word=0x7fffe8602ac8) at ../sysdeps/nptl/futex-internal.h:183
* 4 Thread 0x7fffeed0e700 (LWP 3123) "rustc" rustc_codegen_ssa:🔙:write::spawn_work (cgcx=..., work=...) at /home/wesley/code/rust/rust/compiler/rustc_codegen_ssa/src/back/write.rs:1573
6 Thread 0x7fffe113b700 (LWP 3150) "opt foof.7rcbfp" 0x00007ffff2940e62 in llvm::CallGraph::populateCallGraphNode(llvm::CallGraphNode*) ()
from /home/wesley/.rustup/toolchains/stage1/lib/librustc_driver-f866439e29074957.so
8 Thread 0x7fffe0d39700 (LWP 3158) "opt foof.7rcbfp" 0x00007fffefe8998e in malloc_consolidate (av=av@entry=0x7ffe2c000020) at malloc.c:4492
9 Thread 0x7fffe0f3a700 (LWP 3162) "opt foof.7rcbfp" 0x00007fffefef27c4 in __libc_open64 (file=0x7fffe0f38608 "foof.foof.7rcbfp3g-cgu.6.rcgu.o", oflag=524865) at ../sysdeps/unix/sysv/linux/open64.c:48
(gdb)
```
and Windows Performance Analyzer will also show this information when profiling:

rustc_codegen_ssa: tune codegen according to available concurrency
This change tunes ahead-of-time codegening according to the amount of
concurrency available, rather than according to the number of CPUs on
the system. This can lower memory usage by reducing the number of
compiled LLVM modules in memory at once, particularly across several
rustc instances.
Previously, each rustc instance would assume that it should codegen
ahead of time to meet the demand of number-of-CPUs workers. But often, a
rustc instance doesn't have nearly that much concurrency available to
it, because the concurrency availability is split, via the jobserver,
across all active rustc instances spawned by the driving cargo process,
and is further limited by the `-j` flag argument. Therefore, each rustc
might have had several times the number of LLVM modules in memory than
it really needed to meet demand. If the modules were large, the effect
on memory usage would be noticeable.
With this change, the required amount of ahead-of-time codegen scales up
with the actual number of workers running within a rustc instance. Note
that the number of workers running can be less than the actual
concurrency available to a rustc instance. However, if more concurrency
is actually available, workers are spun up quickly as job tokens are
acquired, and the ahead-of-time codegen scales up quickly as well.
Set path of the compile unit to the source directory
As part of the effort to implement split dwarf debug info, we ended up
setting the compile unit location to the output directory rather than
the source directory. Furthermore, it seems like we failed to remap the
prefixes for this as well!
The desired behaviour is to instead set the `DW_AT_GNU_dwo_name` to a
path relative to compiler's working directory. This still allows
debuggers to find the split dwarf files, while not changing the
behaviour of the code that is compiling with regular debug info, and not
changing the compiler's behaviour with regards to reproducibility.
Fixes#82074
cc `@alexcrichton` `@davidtwco`
remove redundant option/result wrapping of return values
If a function always returns `Ok(something)`, we can return `something` directly and remove the corresponding error handling in the callers.
clippy::unnecessary_wraps
Add new `rustc` target for Arm64 machines that can target the iphonesimulator
This PR lands a new target (`aarch64-apple-ios-sim`) that targets arm64 iphone simulator, previously unreachable from Apple Silicon machines.
resolves#81632
r? `@shepmaster`
This change tunes ahead-of-time codegening according to the amount of
concurrency available, rather than according to the number of CPUs on
the system. This can lower memory usage by reducing the number of
compiled LLVM modules in memory at once, particularly across several
rustc instances.
Previously, each rustc instance would assume that it should codegen
ahead of time to meet the demand of number-of-CPUs workers. But often, a
rustc instance doesn't have nearly that much concurrency available to
it, because the concurrency availability is split, via the jobserver,
across all active rustc instances spawned by the driving cargo process,
and is further limited by the `-j` flag argument. Therefore, each rustc
might have had several times the number of LLVM modules in memory than
it really needed to meet demand. If the modules were large, the effect
on memory usage would be noticeable.
With this change, the required amount of ahead-of-time codegen scales up
with the actual number of workers running within a rustc instance. Note
that the number of workers running can be less than the actual
concurrency available to a rustc instance. However, if more concurrency
is actually available, workers are spun up quickly as job tokens are
acquired, and the ahead-of-time codegen scales up quickly as well.
In the backend we may want to remove certain temporary files, but in
certain other situations these files might not be produced in the first
place. We don't exactly care about that, and the intent is really that
these files are gone after a certain point in the backend.
Here we unify the backend file removing calls to use `ensure_removed`
which will attempt to delete a file, but will not fail if it does not
exist (anymore).
The tradeoff to this approach is, of course, that we may miss instances
were we are attempting to remove files at wrong paths due to some bug –
compilation would silently succeed but the temporary files would remain
there somewhere.
As part of the effort to implement split dwarf debug info, we ended up
setting the compile unit location to the output directory rather than
the source directory. Furthermore, it seems like we failed to remap the
prefixes for this as well!
The desired behaviour is to instead set the `DW_AT_GNU_dwo_name` to a
path relative to compiler's working directory. This still allows
debuggers to find the split dwarf files, while not changing the
behaviour of the code that is compiling with regular debug info, and not
changing the compiler's behaviour with regards to reproducibility.
Fixes#82074
Use -target when linking binaries for Mac Catalyst
When running `rustc` with `-target x86_64-apple-ios-macabi`, the linker
eventually gets run with `-arch x86_64`, because the linker back end splits the
LLVM target triple and uses the first token as the target architecture. However,
this does not work for the Mac Catalyst ABI, which is a separate target from
Darwin.
Specifying the full target triple with `-target` allows Mac Catalyst binaries to
link and run.
closes#80202
This commit adds a new stable codegen option to rustc,
`-Csplit-debuginfo`. The old `-Zrun-dsymutil` flag is deleted and now
subsumed by this stable flag. Additionally `-Zsplit-dwarf` is also
subsumed by this flag but still requires `-Zunstable-options` to
actually activate. The `-Csplit-debuginfo` flag takes one of
three values:
* `off` - This indicates that split-debuginfo from the final artifact is
not desired. This is not supported on Windows and is the default on
Unix platforms except macOS. On macOS this means that `dsymutil` is
not executed.
* `packed` - This means that debuginfo is desired in one location
separate from the main executable. This is the default on Windows
(`*.pdb`) and macOS (`*.dSYM`). On other Unix platforms this subsumes
`-Zsplit-dwarf=single` and produces a `*.dwp` file.
* `unpacked` - This means that debuginfo will be roughly equivalent to
object files, meaning that it's throughout the build directory
rather than in one location (often the fastest for local development).
This is not the default on any platform and is not supported on Windows.
Each target can indicate its own default preference for how debuginfo is
handled. Almost all platforms default to `off` except for Windows and
macOS which default to `packed` for historical reasons.
Some equivalencies for previous unstable flags with the new flags are:
* `-Zrun-dsymutil=yes` -> `-Csplit-debuginfo=packed`
* `-Zrun-dsymutil=no` -> `-Csplit-debuginfo=unpacked`
* `-Zsplit-dwarf=single` -> `-Csplit-debuginfo=packed`
* `-Zsplit-dwarf=split` -> `-Csplit-debuginfo=unpacked`
Note that `-Csplit-debuginfo` still requires `-Zunstable-options` for
non-macOS platforms since split-dwarf support was *just* implemented in
rustc.
There's some more rationale listed on #79361, but the main gist of the
motivation for this commit is that `dsymutil` can take quite a long time
to execute in debug builds and provides little benefit. This means that
incremental compile times appear that much worse on macOS because the
compiler is constantly running `dsymutil` over every single binary it
produces during `cargo build` (even build scripts!). Ideally rustc would
switch to not running `dsymutil` by default, but that's a problem left
to get tackled another day.
Closes#79361
Fix sysroot option not being honored across rustc
Change link_sanitizer_runtime() to check if the sanitizer library exists in the specified/session sysroot, and if it doesn't exist, use the default sysroot. (See #79253.)
Change link_sanitizer_runtime() to check if the sanitizer library exists
in the specified/session sysroot, and if it doesn't exist, use the
default sysroot.
This commit adds a Split DWARF compare mode to compiletest so that
debuginfo tests are also tested using Split DWARF in split mode (and
manually in single mode).
Signed-off-by: David Wood <david@davidtw.co>
This commit implements Split DWARF support, wiring up the flag (added in
earlier commits) to the modified FFI wrapper (also from earlier
commits).
Signed-off-by: David Wood <david@davidtw.co>
This commit removes the `TargetMachineFactory` struct and adds a
`TargetMachineFactoryFn` type alias which is used everywhere that the
previous, long type was used.
Signed-off-by: David Wood <david@davidtw.co>
This commit changes some comments to documentation comments so that
they can be read on the generated rustdoc.
Signed-off-by: David Wood <david@davidtw.co>
Warn if `dsymutil` returns an error code
This checks the error code returned by `dsymutil` and warns if it failed. It
also provides the stdout and stderr logs from `dsymutil`, similar to the native
linker step.
I tried to think of ways to test this change, but so far I haven't found a good way, as you'd likely need to inject some nonsensical args into `dsymutil` to induce failure, which feels too artificial to me. Also, https://github.com/rust-lang/rust/issues/79361 suggests Rust is on the verge of disabling `dsymutil` by default, so perhaps it's okay for this change to be untested. In any case, I'm happy to add a test if someone sees a good approach.
Fixes https://github.com/rust-lang/rust/issues/78770
This checks the error code returned by `dsymutil` and warns if it failed. It
also provides the stdout and stderr logs from `dsymutil`, similar to the native
linker step.
Fixes https://github.com/rust-lang/rust/issues/78770
with an eye on merging `TargetOptions` into `Target`.
`TargetOptions` as a separate structure is mostly an implementation detail of `Target` construction, all its fields logically belong to `Target` and available from `Target` through `Deref` impls.
The wrapper type led to tons of target.target
across the compiler. Its ptr_width field isn't
required any more, as target_pointer_width
is already present in parsed form.
Preparation for a subsequent change that replaces
rustc_target::config::Config with its wrapped Target.
On its own, this commit breaks the build. I don't like making
build-breaking commits, but in this instance I believe that it
makes review easier, as the "real" changes of this PR can be
seen much more easily.
Result of running:
find compiler/ -type f -exec sed -i -e 's/target\.target\([)\.,; ]\)/target\1/g' {} \;
find compiler/ -type f -exec sed -i -e 's/target\.target$/target/g' {} \;
find compiler/ -type f -exec sed -i -e 's/target.ptr_width/target.pointer_width/g' {} \;
./x.py fmt