Update and revamp wasm32 SIMD intrinsics (#874)
Lots of time and lots of things have happened since the simd128 support was first added to this crate. Things are starting to settle down now so this commit syncs the Rust intrinsic definitions with the current specification (https://github.com/WebAssembly/simd). Unfortuantely not everything can be enabled just yet but everything is in the pipeline for getting enabled soon. This commit also applies a major revamp to how intrinsics are tested. The intention is that the setup should be much more lightweight and/or easy to work with after this commit. At a high-level, the changes here are: * Testing with node.js and `#[wasm_bindgen]` has been removed. Instead intrinsics are tested with Wasmtime which has a nearly complete implementation of the SIMD spec (and soon fully complete!) * Testing is switched to `wasm32-wasi` to make idiomatic Rust bits a bit easier to work with (e.g. `panic!)` * Testing of this crate's simd128 feature for wasm is re-enabled. This will run on CI and both compile and execute intrinsics. This should bring wasm intrinsics to the same level of parity as x86 intrinsics, for example. * New wasm intrinsics have been added: * `iNNxMM_loadAxA_{s,u}` * `vNNxMM_load_splat` * `v8x16_swizzle` * `v128_andnot` * `iNNxMM_abs` * `iNNxMM_narrow_*_{u,s}` * `iNNxMM_bitmask` - commented out until LLVM is updated to LLVM 11 * `iNNxMM_widen_*_{u,s}` - commented out until bytecodealliance/wasmtime#1994 lands * `iNNxMM_{max,min}_{u,s}` * `iNNxMM_avgr_u` * Some wasm intrinsics have been removed: * `i64x2_trunc_*` * `f64x2_convert_*` * `i8x16_mul` * The `v8x16.shuffle` instruction is exposed. This is done through a `macro` (not `macro_rules!`, but `macro`). This is intended to be somewhat experimental and unstable until we decide otherwise. This instruction has 16 immediate-mode expressions and is as a result unsuited to the existing `constify_*` logic of this crate. I'm hoping that we can game out over time what a macro might look like and/or look for better solutions. For now, though, what's implemented is the first of its kind in this crate (an architecture-specific macro), so some extra scrutiny looking at it would be appreciated. * Lots of `assert_instr` annotations have been fixed for wasm. * All wasm simd128 tests are uncommented and passing now. This is still missing tests for new intrinsics and it's also missing tests for various corner cases. I hope to get to those later as the upstream spec itself gets closer to stabilization. In the meantime, however, I went ahead and updated the `hex.rs` example with a wasm implementation using intrinsics. With it I got some very impressive speedups using Wasmtime: test benches::large_default ... bench: 213,961 ns/iter (+/- 5,108) = 4900 MB/s test benches::large_fallback ... bench: 3,108,434 ns/iter (+/- 75,730) = 337 MB/s test benches::small_default ... bench: 52 ns/iter (+/- 0) = 2250 MB/s test benches::small_fallback ... bench: 358 ns/iter (+/- 0) = 326 MB/s or otherwise using Wasmtime hex encoding using SIMD is 15x faster on 1MB chunks or 7x faster on small <128byte chunks. All of these intrinsics are still unstable and will continue to be so presumably until the simd proposal in wasm itself progresses to a later stage. Additionaly we'll still want to sync with clang on intrinsic names (or decide not to) at some point in the future. * wasm: Unconditionally expose SIMD functions This commit unconditionally exposes SIMD functions from the `wasm32` module. This is done in such a way that the standard library does not need to be recompiled to access SIMD intrinsics and use them. This, hopefully, is the long-term story for SIMD in WebAssembly in Rust. It's unlikely that all WebAssembly runtimes will end up implementing SIMD so the standard library is unlikely to use SIMD any time soon, but we want to make sure it's easily available to folks! This commit enables all this by ensuring that SIMD is available to the standard library, regardless of compilation flags. This'll come with the same caveats as x86 support, where it doesn't make sense to call these functions unless you're enabling simd support one way or another locally. Additionally, as with x86, if you don't call these functions then the instructions won't show up in your binary. While I was here I went ahead and expanded the WebAssembly-specific documentation for the wasm32 module as well, ensuring that the current state of SIMD/Atomics are documented.
This commit is contained in:
@@ -1,9 +1,12 @@
|
||||
#![doc(include = "core_arch_docs.md")]
|
||||
#![allow(improper_ctypes_definitions)]
|
||||
#![allow(dead_code)]
|
||||
#![allow(unused_features)]
|
||||
#![allow(incomplete_features)]
|
||||
#![feature(
|
||||
const_fn,
|
||||
const_fn_union,
|
||||
const_generics,
|
||||
custom_inner_attributes,
|
||||
link_llvm_intrinsics,
|
||||
platform_intrinsics,
|
||||
@@ -32,9 +35,12 @@
|
||||
adx_target_feature,
|
||||
rtm_target_feature,
|
||||
f16c_target_feature,
|
||||
external_doc
|
||||
external_doc,
|
||||
allow_internal_unstable,
|
||||
decl_macro
|
||||
)]
|
||||
#![cfg_attr(test, feature(test, abi_vectorcall, untagged_unions))]
|
||||
#![cfg_attr(all(test, target_arch = "wasm32"), feature(wasm_simd))]
|
||||
#![deny(clippy::missing_inline_in_public_items)]
|
||||
#![allow(
|
||||
clippy::inline_always,
|
||||
@@ -66,13 +72,10 @@ extern crate std_detect;
|
||||
#[cfg(test)]
|
||||
extern crate stdarch_test;
|
||||
|
||||
#[cfg(all(test, target_arch = "wasm32"))]
|
||||
extern crate wasm_bindgen_test;
|
||||
|
||||
#[path = "mod.rs"]
|
||||
mod core_arch;
|
||||
|
||||
pub use self::core_arch::arch::*;
|
||||
pub use self::core_arch::arch;
|
||||
|
||||
#[allow(unused_imports)]
|
||||
use core::{ffi, hint, intrinsics, marker, mem, ops, ptr, sync};
|
||||
|
||||
@@ -57,14 +57,110 @@ pub mod arch {
|
||||
|
||||
/// Platform-specific intrinsics for the `wasm32` platform.
|
||||
///
|
||||
|
||||
/// # Availability
|
||||
/// This module provides intrinsics specific to the WebAssembly
|
||||
/// architecture. Here you'll find intrinsics necessary for leveraging
|
||||
/// WebAssembly proposals such as [atomics] and [simd]. These proposals are
|
||||
/// evolving over time and as such the support here is unstable and requires
|
||||
/// the nightly channel. As WebAssembly proposals stabilize these functions
|
||||
/// will also become stable.
|
||||
///
|
||||
/// Note that intrinsics gated by `target_feature = "atomics"` or `target_feature = "simd128"`
|
||||
/// are only available **when the standard library itself is compiled with the the respective
|
||||
/// target feature**. This version of the standard library is not obtainable via `rustup`,
|
||||
/// but rather will require the standard library to be compiled from source.
|
||||
/// See the [module documentation](../index.html) for more details.
|
||||
/// [atomics]: https://github.com/webassembly/threads
|
||||
/// [simd]: https://github.com/webassembly/simd
|
||||
///
|
||||
/// See the [module documentation](../index.html) for general information
|
||||
/// about the `arch` module and platform intrinsics.
|
||||
///
|
||||
/// ## Atomics
|
||||
///
|
||||
/// The [threads proposal][atomics] for WebAssembly adds a number of
|
||||
/// instructions for dealing with multithreaded programs. Atomic
|
||||
/// instructions can all be generated through `std::sync::atomic` types, but
|
||||
/// some instructions have no equivalent in Rust such as
|
||||
/// `memory.atomic.notify` so this module will provide these intrinsics.
|
||||
///
|
||||
/// At this time, however, these intrinsics are only available **when the
|
||||
/// standard library itself is compiled with atomics**. Compiling with
|
||||
/// atomics is not enabled by default and requires passing
|
||||
/// `-Ctarget-feature=+atomics` to rustc. The standard library shipped via
|
||||
/// `rustup` is not compiled with atomics. To get access to these intrinsics
|
||||
/// you'll need to compile the standard library from source with the
|
||||
/// requisite compiler flags.
|
||||
///
|
||||
/// ## SIMD
|
||||
///
|
||||
/// The [simd proposal][simd] for WebAssembly adds a new `v128` type for a
|
||||
/// 128-bit SIMD register. It also adds a large array of instructions to
|
||||
/// operate on the `v128` type to perform data processing. The SIMD proposal
|
||||
/// has been in progress for quite some time and many instructions have come
|
||||
/// and gone. This module attempts to keep up with the proposal, but if you
|
||||
/// notice anything awry please feel free to [open an
|
||||
/// issue](https://github.com/rust-lang/stdarch/issues/new).
|
||||
///
|
||||
/// It's important to be aware that the current state of development of SIMD
|
||||
/// in WebAssembly is still somewhat early days. There's lots of pieces to
|
||||
/// demo and prototype with, but discussions and support are still in
|
||||
/// progress. There's a number of pitfalls and gotchas in various places,
|
||||
/// which will attempt to be documented here, but there may be others
|
||||
/// lurking!
|
||||
///
|
||||
/// Using SIMD is intended to be similar to as you would on `x86_64`, for
|
||||
/// example. You'd write a function such as:
|
||||
///
|
||||
/// ```rust,ignore
|
||||
/// #[cfg(target_arch = "wasm32")]
|
||||
/// #[target_feature(enable = "simd128")]
|
||||
/// unsafe fn uses_simd() {
|
||||
/// use std::arch::wasm32::*;
|
||||
/// // ...
|
||||
/// }
|
||||
/// ```
|
||||
///
|
||||
/// Unlike `x86_64`, however, WebAssembly does not currently have dynamic
|
||||
/// detection at runtime as to whether SIMD is supported (this is one of the
|
||||
/// motivators for the [conditional sections proposal][condsections], but
|
||||
/// that is still pretty early days). This means that your binary will
|
||||
/// either have SIMD and can only run on engines which support SIMD, or it
|
||||
/// will not have SIMD at all. For compatibility the standard library itself
|
||||
/// does not use any SIMD internally. Determining how best to ship your
|
||||
/// WebAssembly binary with SIMD is largely left up to you as it can can be
|
||||
/// pretty nuanced depending on your situation.
|
||||
///
|
||||
/// [condsections]: https://github.com/webassembly/conditional-sections
|
||||
///
|
||||
/// To enable SIMD support at compile time you need to do one of two things:
|
||||
///
|
||||
/// * First you can annotate functions with `#[target_feature(enable =
|
||||
/// "simd128")]`. This causes just that one function to have SIMD support
|
||||
/// available to it, and intrinsics will get inlined as usual in this
|
||||
/// situation.
|
||||
///
|
||||
/// * Second you can compile your program with `-Ctarget-feature=+simd128`.
|
||||
/// This compilation flag blanket enables SIMD support for your entire
|
||||
/// compilation. Note that this does not include the standard library
|
||||
/// unless you recompile the standard library.
|
||||
///
|
||||
/// If you enable SIMD via either of these routes then you'll have a
|
||||
/// WebAssembly binary that uses SIMD instructions, and you'll need to ship
|
||||
/// that accordingly. Also note that if you call SIMD intrinsics but don't
|
||||
/// enable SIMD via either of these mechanisms, you'll still have SIMD
|
||||
/// generated in your program. This means to generate a binary without SIMD
|
||||
/// you'll need to avoid both options above plus calling into any intrinsics
|
||||
/// in this module.
|
||||
///
|
||||
/// > **Note**: Due to
|
||||
/// > [rust-lang/rust#74320](https://github.com/rust-lang/rust/issues/74320)
|
||||
/// > it's recommended to compile your entire program with SIMD support
|
||||
/// > (using `RUSTFLAGS`) or otherwise functions may not be inlined
|
||||
/// > correctly.
|
||||
///
|
||||
/// > **Note**: LLVM's SIMD support is actually split into two features:
|
||||
/// > `simd128` and `unimplemented-simd128`. Rust code can enable `simd128`
|
||||
/// > with `#[target_feature]` (and test for it with `#[cfg(target_feature =
|
||||
/// > "simd128")]`, but it cannot enable `unimplemented-simd128`. The only
|
||||
/// > way to enable this feature is to compile with
|
||||
/// > `-Ctarget-feature=+simd128,+unimplemented-simd128`. This second
|
||||
/// > feature enables more recent instructions implemented in LLVM which
|
||||
/// > haven't always had enough time to make their way to runtimes.
|
||||
#[cfg(any(target_arch = "wasm32", dox))]
|
||||
#[doc(cfg(target_arch = "wasm32"))]
|
||||
#[stable(feature = "simd_wasm32", since = "1.33.0")]
|
||||
|
||||
@@ -10,8 +10,6 @@
|
||||
|
||||
#[cfg(test)]
|
||||
use stdarch_test::assert_instr;
|
||||
#[cfg(test)]
|
||||
use wasm_bindgen_test::wasm_bindgen_test;
|
||||
|
||||
extern "C" {
|
||||
#[link_name = "llvm.wasm.atomic.wait.i32"]
|
||||
@@ -22,7 +20,7 @@ extern "C" {
|
||||
fn llvm_atomic_notify(ptr: *mut i32, cnt: i32) -> i32;
|
||||
}
|
||||
|
||||
/// Corresponding intrinsic to wasm's [`i32.atomic.wait` instruction][instr]
|
||||
/// Corresponding intrinsic to wasm's [`memory.atomic.wait32` instruction][instr]
|
||||
///
|
||||
/// This function, when called, will block the current thread if the memory
|
||||
/// pointed to by `ptr` is equal to `expression` (performing this action
|
||||
@@ -50,14 +48,14 @@ extern "C" {
|
||||
/// library is not obtainable via `rustup`, but rather will require the
|
||||
/// standard library to be compiled from source.
|
||||
///
|
||||
/// [instr]: https://github.com/WebAssembly/threads/blob/master/proposals/threads/Overview.md#wait
|
||||
/// [instr]: https://webassembly.github.io/threads/syntax/instructions.html#syntax-instr-atomic-memory
|
||||
#[inline]
|
||||
#[cfg_attr(test, assert_instr("i32.atomic.wait"))]
|
||||
pub unsafe fn i32_atomic_wait(ptr: *mut i32, expression: i32, timeout_ns: i64) -> i32 {
|
||||
pub unsafe fn memory_atomic_wait32(ptr: *mut i32, expression: i32, timeout_ns: i64) -> i32 {
|
||||
llvm_atomic_wait_i32(ptr, expression, timeout_ns)
|
||||
}
|
||||
|
||||
/// Corresponding intrinsic to wasm's [`i64.atomic.wait` instruction][instr]
|
||||
/// Corresponding intrinsic to wasm's [`memory.atomic.wait64` instruction][instr]
|
||||
///
|
||||
/// This function, when called, will block the current thread if the memory
|
||||
/// pointed to by `ptr` is equal to `expression` (performing this action
|
||||
@@ -85,14 +83,14 @@ pub unsafe fn i32_atomic_wait(ptr: *mut i32, expression: i32, timeout_ns: i64) -
|
||||
/// library is not obtainable via `rustup`, but rather will require the
|
||||
/// standard library to be compiled from source.
|
||||
///
|
||||
/// [instr]: https://github.com/WebAssembly/threads/blob/master/proposals/threads/Overview.md#wait
|
||||
/// [instr]: https://webassembly.github.io/threads/syntax/instructions.html#syntax-instr-atomic-memory
|
||||
#[inline]
|
||||
#[cfg_attr(test, assert_instr("i64.atomic.wait"))]
|
||||
pub unsafe fn i64_atomic_wait(ptr: *mut i64, expression: i64, timeout_ns: i64) -> i32 {
|
||||
pub unsafe fn memory_atomic_wait64(ptr: *mut i64, expression: i64, timeout_ns: i64) -> i32 {
|
||||
llvm_atomic_wait_i64(ptr, expression, timeout_ns)
|
||||
}
|
||||
|
||||
/// Corresponding intrinsic to wasm's [`atomic.notify` instruction][instr]
|
||||
/// Corresponding intrinsic to wasm's [`memory.atomic.notify` instruction][instr]
|
||||
///
|
||||
/// This function will notify a number of threads blocked on the address
|
||||
/// indicated by `ptr`. Threads previously blocked with the `i32_atomic_wait`
|
||||
@@ -112,9 +110,9 @@ pub unsafe fn i64_atomic_wait(ptr: *mut i64, expression: i64, timeout_ns: i64) -
|
||||
/// library is not obtainable via `rustup`, but rather will require the
|
||||
/// standard library to be compiled from source.
|
||||
///
|
||||
/// [instr]: https://github.com/WebAssembly/threads/blob/master/proposals/threads/Overview.md#wake
|
||||
/// [instr]: https://webassembly.github.io/threads/syntax/instructions.html#syntax-instr-atomic-memory
|
||||
#[inline]
|
||||
#[cfg_attr(test, assert_instr("atomic.wake"))]
|
||||
pub unsafe fn atomic_notify(ptr: *mut i32, waiters: u32) -> u32 {
|
||||
pub unsafe fn memory_atomic_notify(ptr: *mut i32, waiters: u32) -> u32 {
|
||||
llvm_atomic_notify(ptr, waiters as i32) as u32
|
||||
}
|
||||
|
||||
@@ -1,7 +1,5 @@
|
||||
#[cfg(test)]
|
||||
use stdarch_test::assert_instr;
|
||||
#[cfg(test)]
|
||||
use wasm_bindgen_test::wasm_bindgen_test;
|
||||
|
||||
extern "C" {
|
||||
#[link_name = "llvm.wasm.memory.grow.i32"]
|
||||
|
||||
@@ -2,17 +2,13 @@
|
||||
|
||||
#[cfg(test)]
|
||||
use stdarch_test::assert_instr;
|
||||
#[cfg(test)]
|
||||
use wasm_bindgen_test::wasm_bindgen_test;
|
||||
|
||||
#[cfg(any(target_feature = "atomics", dox))]
|
||||
mod atomic;
|
||||
#[cfg(any(target_feature = "atomics", dox))]
|
||||
pub use self::atomic::*;
|
||||
|
||||
#[cfg(any(target_feature = "simd128", dox))]
|
||||
mod simd128;
|
||||
#[cfg(any(target_feature = "simd128", dox))]
|
||||
pub use self::simd128::*;
|
||||
|
||||
mod memory;
|
||||
|
||||
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user