Commit Graph

404 Commits

Author SHA1 Message Date
Joshua Nelson
21c821c6c9 Port outline-atomics to rust
This has a very long history, summarized in
https://github.com/rust-lang/rust/issues/109064. This port is a very
minimal subset of `aarch64/lse.S` from LLVM's compiler-rt. In
particular, it is missing the following:

1. Any form of runtime dispatch between LL/SC and LSE.

Determining which version of the intrinsics to use
requires one of the following:

  i) `getauxval` from glibc. It's unclear whether `compiler_builtins` is
allowed to depend on libc at all, and musl doesn't even support
getauxval. Don't enshrine the requirement "de-facto" by making it
required for outline-atomics.

  ii) kernel support. Linux and FreeBSD have limited support, but it
requires an extremely recent kernel version and doesn't work at all under QEMU (https://github.com/rust-lang/rust/issues/109064#issuecomment-1494939904).

Instead, we hard-code LL/SC intrinsics. Users who want LSE support
should use the LLVM compiler-rt (if you're building from source in
rust-lang/rust, make sure you have `src/llvm-project` checked out
locally. the goal is to soon add a new `optimized-compiler-builtins`
option so this is easier to discover).

2. The global `___aarch64_have_lse_atomics` CTOR, required to do runtime
   dispatch. Thom Chiviolani has this to say about global CTORs:

> static ctors are problems because we are pretty eager about dead code elim
> in general if you have a module that isnt directly reference we will probably not have its static ctors
> also, while llvm has a super robust way to have a static ctor (theres s special "appending global" to use for c++), we dont use that and just have people make a #[used] static in a special section
> 1. the robust way kinda requires rust knowing that the argument is a static ctor (maybe a #[rustc_static_ctor] attribute). it also would be... finnicky, since on windows we actually care beyond being a static ctor, that we run as part in a specific group of ctors, which means a very specific section (one for TLS and the other for, uh, i dont remember)
> 2. we still actually have to codegen the cgu that isn't referenced. but maybe we could remember that it has that attribute and use that

So while this is possible in theory, it's decidedly non-trivial, and
needs invasive changes to rust itself. In any case, it doesn't matter
until we decide the story around libc.

3. The 16-byte (i128) version of compare_and_swap. This wouldn't be
   *too* hard to add, but it would be hard to test. The way I tested the
existing code was not just with unit tests but also by loading it as a
path dependency and running `x test core` - the latter caught several
bugs the unit tests didn't catch (because I originally wrote the tests
wrong). So I am slightly nervous about adding a 16-byte version that is
much more poorly tested than the other intrinsics.
2023-06-26 05:56:08 +00:00
kirk
d3951c082e allow stable features lint, fix link formatting warning, add ignore block to intrinsics macro documentation 2023-06-17 14:11:31 +00:00
Patryk Wychowaniec
73f57893aa fix: Add #[avr_skip] for floats
Same story as always, i.e. ABI mismatch:

- https://github.com/rust-lang/compiler-builtins/pull/462
- https://github.com/rust-lang/compiler-builtins/pull/466
- https://github.com/rust-lang/compiler-builtins/pull/513

I've made sure the changes work by rendering a Mandelbrot fractal:

```rust
#[arduino_hal::entry]
fn main() -> ! {
    let dp = arduino_hal::Peripherals::take().unwrap();
    let pins = arduino_hal::pins!(dp);
    let mut serial = arduino_hal::default_serial!(dp, pins, 57600);

    mandelbrot(&mut serial, 60, 40, -2.05, -1.12, 0.47, 1.12, 100);

    loop {
        //
    }
}

fn mandelbrot<T>(
    output: &mut T,
    viewport_width: i64,
    viewport_height: i64,
    x1: f32,
    y1: f32,
    x2: f32,
    y2: f32,
    max_iterations: i64,
) where
    T: uWrite,
{
    for viewport_y in 0..viewport_height {
        let y0 = y1 + (y2 - y1) * ((viewport_y as f32) / (viewport_height as f32));

        for viewport_x in 0..viewport_width {
            let x0 = x1 + (x2 - x1) * ((viewport_x as f32) / (viewport_width as f32));

            let mut x = 0.0;
            let mut y = 0.0;
            let mut iterations = max_iterations;

            while x * x + y * y <= 4.0 && iterations > 0 {
                let xtemp = x * x - y * y + x0;
                y = 2.0 * x * y + y0;
                x = xtemp;
                iterations -= 1;
            }

            let ch = "#%=-:,. "
                .chars()
                .nth((8.0 * ((iterations as f32) / (max_iterations as f32))) as _)
                .unwrap();

            _ = ufmt::uwrite!(output, "{}", ch);
        }

        _ = ufmt::uwriteln!(output, "");
    }
}
```

... where without avr_skips, the code printed an image full of only `#`.

Note that because libgcc doesn't provide implementations for f64, using
those (e.g. swapping f32 to f64 in the code above) will cause linking to
fail:

```
undefined reference to `__divdf3'
undefined reference to `__muldf3'
undefined reference to `__gedf2'
undefined reference to `__fixunsdfsi'
undefined reference to `__gtdf2'
```

Ideally compiler-builtins could jump right in and provide those, but f64
also require a special calling convention which hasn't been yet exposed
through LLVM.

Note that because using 64-bit floats on an 8-bit target is a pretty
niche thing to do, and because f64 floats don't work correctly anyway at
the moment (due to this ABI mismatch), we're not actually breaking
anything by skipping those functions, since any code that currently uses
f64 on AVR works by accident.

Closes https://github.com/rust-lang/rust/issues/108489.
2023-06-12 14:14:36 +02:00
danakj
fb77604e79 Add the weak-intrinsics feature
When enabled, the weak-intrinsics feature will cause all intrinsics
functions to be marked with weak linkage (i.e. `#[linkage = "weak"])
so that they can be replaced at link time by a stronger symbol.

This can be set to use C++ intrinsics from the compiler-rt library,
as it will avoid Rust's implementation replacing the compiler-rt
implementation as long as the compiler-rt symbols are linked as
strong symbols. Typically this requires the compiler-rt library to
be explicitly specified in the link command.

Addresses https://github.com/rust-lang/compiler-builtins/issues/525.

Without weak-intrinsics, from nm:
```
00000000 W __aeabi_memclr8  // Is explicitly weak
00000000 T __udivsi3  // Is not.
```

With weak-intrinsics, from nm:
```
00000000 W __aeabi_memclr8  // Is explicitly weak
00000000 W __udivsi3  // Is weak due to weak-intrinsics
```
2023-05-19 17:35:00 -04:00
William D. Jones
5246405d61 Ensure shift instrinsic arguments match width of compiler-rt's (int vs si_int). 2023-03-29 17:55:46 -04:00
Taiki Endo
2f43b93603 Fix panic due to overflow in riscv.rs and int/shift.rs 2023-03-23 22:00:22 +09:00
Amanieu d'Antras
da5001e4c6 Merge pull request #516 from TDecki/master 2023-03-12 22:17:23 +01:00
Tobias Decking
59766673e7 more fixing 2023-03-06 19:28:49 +01:00
Tobias Decking
73fa0deaa6 formatting 2023-03-06 19:24:02 +01:00
Tobias Decking
215f63ad19 Final version. 2023-03-06 19:20:30 +01:00
Andrew Kane
59fb34e63a Added lgamma_r and lgammaf_r 2023-03-05 12:17:21 -08:00
Tobias Decking
04fe482160 Formatting 2023-02-22 22:19:10 +01:00
Tobias Decking
c7404df069 Provide a non-sse version for x86_64. 2023-02-22 22:16:29 +01:00
Scott Mabin
67ebc4ae14 Extend the intrinsics exported for Xtensa no_std 2023-02-22 20:56:31 +00:00
Tobias Decking
97402fc580 Change implementation to SSE 2023-02-22 21:54:33 +01:00
Tobias Decking
e9e34801a6 Remove superfluous comment. 2023-02-22 00:10:46 +01:00
Tobias Decking
b53c1ae17a Improve assembly quality + AT&T syntax. 2023-02-22 00:07:41 +01:00
Tobias Decking
44d34e87fa Update path for argument. 2023-02-21 23:36:47 +01:00
Tobias Decking
990596fa28 Correct path. 2023-02-21 23:32:39 +01:00
Tobias Decking
a11bd1cfdb Specialize strlen for x86_64. 2023-02-21 23:13:02 +01:00
Josh Stone
8d4e906206 Drop the llvm14-builtins-abi hack 2023-02-01 14:52:18 -08:00
Patryk Wychowaniec
d20eea4d47 fix: Add #[avr_skip] for bit shifts
This commit follows the same logic as:

- https://github.com/rust-lang/compiler-builtins/pull/462
- https://github.com/rust-lang/compiler-builtins/pull/466

I've tested the changes by preparing a simple program:

```rust
fn calc() -> ... {
    let x = hint::black_box(4u...); // 4u8, 4u16, 4u32, 4u64, 4u128 + signed
    let y = hint::black_box(1u32);

    // x >> y
    // x << y
}

fn main() -> ! {
    let dp = arduino_hal::Peripherals::take().unwrap();
    let pins = arduino_hal::pins!(dp);
    let mut serial = arduino_hal::default_serial!(dp, pins, 57600);

    for b in calc().to_le_bytes() {
        _ = ufmt::uwrite!(&mut serial, "{} ", b);
    }

    _ = ufmt::uwriteln!(&mut serial, "");

    loop {
        //
    }
}
```

... switching types & operators in `calc()`, and observing the results;
what I ended up with was:

```
 u32 << u32 - ok
 u64 << u32 - ok
u128 << u32 - error (undefined reference to `__ashlti3')
 i32 >> u32 - ok
 i64 >> u32 - ok
i128 >> u32 - error (undefined reference to `__ashrti3')
 u32 >> u32 - ok
 u64 >> u32 - ok
u128 >> u32 - error (undefined reference to `__lshrti3')

(where "ok" = compiles and returns correct results)
```

As with multiplication and division, so do in here 128-bit operations
not work, because avr-gcc's standard library doesn't provide them (at
the same time, requiring that specific calling convention, making it
pretty difficult for compiler-builtins to jump in).

I think 128-bit operations non-working on an 8-bit controller is an
acceptable trade-off - 😇 - and so the entire fix in here is
just about skipping those functions.
2022-12-25 11:46:30 +01:00
Martin Kröning
620f50589e Expose minimal floating point symbols for x86_64-unknown-none 2022-12-07 16:08:01 +01:00
Jérome Eertmans
80828bfb0b fix(docs): typo in docstrings
Hello, I think you misspelled `width` to `with`.
2022-11-28 10:53:42 +01:00
Jules Bertholet
af5664be84 Update libm, add rint and rintf 2022-11-08 21:04:02 -05:00
Felix S. Klock II
2faf57c08d might as well add the link to the LLVM assembly code as well. 2022-10-25 12:32:41 -04:00
Felix S. Klock II
8266a1343b Document origins of the multiplication method being used here. 2022-10-25 11:25:14 -04:00
Ralf Jung
6267545315 invoke the unreachable intrinsic, not the stable wrapper 2022-10-10 19:34:48 +02:00
Amanieu d'Antras
2e0590c997 Fix clippy lints 2022-10-10 17:40:16 +01:00
Lokathor
f6cd5cf806 Update macros.rs 2022-09-27 13:22:45 -06:00
D1plo1d
1d5b952100 math: Enabled floating point intrinsics for RISCV32 microcontrollers 2022-09-17 11:47:21 -04:00
David Hoppenbrouwers
a695cf95cf Remove c32() from x86_64 memcmp
Fixes https://github.com/rust-lang/compiler-builtins/issues/487
2022-08-10 11:29:38 +02:00
Nicholas Bishop
abb6893a85 Enable unadjusted_on_win64 for UEFI in some cases
The conversion functions from i128/u128 to f32/f64 have the
`unadjusted_on_win64` attribute, but it is disabled starting with
LLVM14. This seems to be the correct thing to do for Win64, but for some
reason x86_64-unknown-uefi is different, despite generally using the
same ABI as Win64.
2022-08-03 19:16:03 -04:00
Amanieu d'Antras
b71753da80 Merge pull request #484 from Alexhuszagh/armv5te 2022-07-30 02:42:08 +02:00
Amanieu d'Antras
a81a868a59 Merge pull request #482 from ankane/gamma 2022-07-30 02:41:21 +02:00
Alex Huszagh
599dcc2c46 Add compiler-rt fallbacks for sync builtins on armv5te-musl. 2022-07-29 16:58:05 -05:00
Andrew Kane
89df6f6bc0 Added tgamma and tgammaf 2022-07-28 16:21:37 -07:00
David Hoppenbrouwers
66f22e0931 Remove branches around rep movsb/stosb
While it is measurably faster for older CPUs, removing them keeps the code
smaller and is likely more beneficial for newer CPUs.
2022-07-28 18:45:28 +02:00
David Hoppenbrouwers
04c223f0df Skip rep movsb in copy_backward if possible
There is currently no measureable performance difference in benchmarks
but it likely will make a difference in real workloads.
2022-07-28 18:32:57 +02:00
David Hoppenbrouwers
30e0c1f4c2 Use att_syntax for now 2022-07-28 18:32:56 +02:00
David Hoppenbrouwers
897a133869 Remove rep_param_rev 2022-07-28 18:32:56 +02:00
David Hoppenbrouwers
45e2996c96 Fix suboptimal codegen in memset 2022-07-28 18:32:56 +02:00
David Hoppenbrouwers
a977b01090 Align destination in mem* instructions.
While misaligned reads are generally fast, misaligned writes aren't and
can have severe penalties.
2022-07-28 18:32:51 +02:00
Amanieu d'Antras
0cc9a7e4a6 Merge pull request #478 from Lokathor/weak-linkage-for-division 2022-07-28 18:14:59 +02:00
Nicholas Bishop
586e2b38ef Enable win64_128bit_abi_hack for x86_64-unknown-uefi
The `x86_64-unknown-uefi` target is Windows-like [1], and requires the
same altered ABI for some 128-bit integer intrinsics.

See also https://github.com/rust-lang/rust/issues/86494.

[1]: https://github.com/rust-lang/rust/blob/master/compiler/rustc_target/src/spec/x86_64_unknown_uefi.rs
2022-07-28 11:55:59 -04:00
Lokathor
8568a33255 restrict linkage to platforms using ELF binaries
on windows and apple (which don't use ELF) we can't apply weak linkage
2022-07-28 09:42:18 -06:00
Lokathor
1070134a56 Merge pull request #1 from rust-lang/master
updates from main
2022-07-28 09:36:02 -06:00
Ayush Singh
f2ac36348c Use all of src/math for UEFI
This is needed for libtest
2022-07-28 20:51:44 +05:30
Lokathor
011f92c877 add weak linkage to the ARM AEABI division functions 2022-07-22 17:14:18 -06:00
Amanieu d'Antras
d14c5a43b7 Merge pull request #471 from Demindiro/x86_64-fix-recursive-memcmp 2022-06-12 02:19:30 +02:00