The reorganization PR has caused this to fail once before because every
file shows up as changed. Increase the timeout so this doesn't happen.
We now cancel the job if too many extensive tests are run unless `ci:
allow-many-extensive` is in the PR description, so this helps prevent
the limit being hit by accident.
Error out when too many extensive tests would be run unless `ci:
allow-many-extensive` is in the PR description. This allows us to set a
much higher CI timeout with less risk that a 4+ hour job gets started by
accident.
Sometimes we do refactoring that moves things around and triggers an
extensive test, even though the implementation didn't change. There
isn't any need to run full extensive CI in these cases, so add a way to
skip it from the PR message.
Jobs should just cancel automatically, it isn't ideal that extensive
jobs can continue running for multiple hours after code has been
updated. Use a solution from [1] to do this.
[1]: https://stackoverflow.com/a/72408109/5380651
Splitting into different source files by float size doesn't have any
benefit when the only content is a small function that forwards to the
generic implementation. Combine the source files for all width versions
of:
* ceil
* copysign
* fabs
* fdim
* floor
* fmaximum
* fmaximum_num
* fminimum
* fminimum_num
* ldexp
* scalbn
* sqrt
* truc
fmod is excluded to avoid conflicts with an open PR.
As part of this change move unit tests out of the generic module,
instead testing the type-specific functions (e.g. `ceilf16` rather than
`ceil::<f16>()`). This ensures that unit tests are validating whatever
we expose, such as arch-specific implementations via
`select_implementation!`, which would otherwise be skipped. (They are
still covered by integration tests).
Introduce a constant representing NaN with a negative sign bit for use
with testing. There isn't really any guarantee that `F::NAN` is positive
but in practice it always is, which is good enough for testing purposes.
`bl!` is being used to add a leading underscore on Apple targets.
`asm_sym` has been around since 2022 and handles platform-specific
symbol names automatically, so make use of this instead.
I have verified that `armv7s-apple-ios` still builds correctly.
Discussed at [1], there was an off-by-one mistake when converting from
the loop routine to using `leading_zeros` for normalization.
Currently, using `EXP_BITS` has the effect that `ix` after the branch
has its MSB _one bit to the left_ of the implicit bit's position,
whereas a shift by `EXP_BITS + 1` ensures that the MSB is exactly at the
implicit bit's position, matching what is done for normals (where the
implicit bit is set to be explicit). This doesn't seem to have any
effect in our implementation since the failing test cases from [1]
appear to still have correct results.
Since the result of using `EXP_BITS + 1` is more consistent with what is
done for normals, apply this here.
[1]: https://github.com/rust-lang/libm/pull/469#discussion_r2012473920
Parsing errors are now bubbled up part of the way, but that needs some
more work.
Rounding should be correct, and the `Status` returned by `parse_any`
should have the correct bits set. These are used for the current (unchanged)
behavior of the surface level functions like `hf64`: panic on invalid inputs, or
values that aren't exactly representable.
Replace `core::arch` versions of the following with handwritten
assembly, which avoids recursion issues (cg_gcc using `rint` as a
fallback) as well as problems with `aarch64be`.
* `rint`
* `rintf`
Additionally, add assembly versions of the following:
* `fma`
* `fmaf`
* `sqrt`
* `sqrtf`
If the `fp16` target feature is available, which implies `neon`, also
include the following:
* `rintf16`
* `sqrtf16`
`sqrt` is added to match the implementation for `x86`. `fma` is included
since it is used by many other routines.
There are a handful of other operations that have assembly
implementations. They are omitted here because we should have basic
float math routines available in `core` in the near future, which will
allow us to defer to LLVM for assembly lowering rather than implementing
these ourselves.
Some backends may replace calls to `core::arch` with multiple calls to
`sqrt` [1], which becomes recursive. Help mitigate this by replacing the
call with assembly.
Results in the same assembly as the current implementation when built
with optimizations.
[1]: https://github.com/rust-lang/compiler-builtins/issues/649
`libm` no longer uses this directly in `cfg`, it is only for setting
other configuration in the `libm` `build.rs`. Clean up this
configuration in `compiler-builtins` since it is unused.
* Delete some memcpy tests that were a bit excessive
* Always use the same offset of 65
* Add a memmove test with aligned source and destination
* Improve printing output and add more comments
* Use a constant for 1 MiB so it shows up in the benchmark logs
This crate [1] makes it reasonably easy to get instruction count
performance metrics that are stable enough to run in CI, and has worked
out well since integrating it with `libm`. Add new benchmarks for `mem`
functions using `iai-callgrind`, modeling them off of the existing
benchmarks.
[1]: https://github.com/iai-callgrind/iai-callgrind
The current setup has the `Cargo.toml` for `compiler-builtins` at the
repository root, which means all support crates and other files are
located within the package root. This works for now but is not the
cleanest setup since files that should or shouldn't be included in the
package need to be configured in `Cargo.toml`. If we eventually merge
`libm` development into this repository, it would be nice to make this
separation more straightforward.
Begin cleaning things up by moving the crate source to a new
`compiler-builtins` directory and adding a virtual manifest. For now the
`libm` submodule is also moved, but in the future it can likely move
back to the top level (ideally `compiler-builtins/src` would contain a
symlink to `libm/src/math`, but unfortunately it seems like Cargo does
not like something about the submodule + symlink combination).
Currently there is an interesting situation with the way features get
enabled; `testcrate` enables `mangled-names`, but the `intrinsics.rs`
example requires this feature be disabled (otherwise the test fails with
missing symbols, as expected). This is also the reason that `testcrate`
is not a default workspace member, meaning `cargo test` doesn't actually
run `testcrate`'s tests; making it a default member would mean that
`compiler-builtins/mangled-names` gets enabled when
`examples/intrinsics.rs` gets built, due to the way features get
unified.
Simplify the situation by making moving the example to its own crate as
`builtins-test-intrinsics`. This also means `testcrate` can become a
default member so it is included in `cargo check` or `cargo test` when
run at the workspace root.
`testcrate` and `builtins-test-intrinsics` still can't be built at the
same time since there isn't a straightforward way to have Cargo build
`compiler-builtins` twice with different features. This is a side effect
of us using non-additive features, but there isn't really a better
option since enabling both mangled and unmangled names would render
`builtins-test-intrinsics` useless.
It seems like "sign" was used as a shortened version of "significand",
but that is easy to confuse with "sign". Update these to use "sig" like
most other places.
The fix to this issue was synced in [1] so we should no longer need to
keep aarch64 pinned.
This reverts commit b2bcfc838e2a4b72fa62b333e3eb91f250aa4539.
[1]: https://github.com/rust-lang/rust/pull/137661
Since we have a handful of different float-related configuration in
testcrate, track a list of which are implied by others rather than
repeating the config.