This commit updates to the latest nightly's syntax where `#[target_feature =
"+foo"]` is now deprecated in favor of `#[target_feature(enable = "foo")]`.
Additionally `#[target_feature]` can only be applied to `unsafe` functions for
now.
Along the way this removes a few exampels that were just left around and also
disables the `fxsr` modules as that target feature will need to land in upstream
rust-lang/rust first as it's currently unknown to the compiler.
- `_m_paddb` for `_mm_add_pi8`
- `_m_paddw` for `_mm_add_pi16`
- `_m_paddd` for `_mm_add_pi32`
- `_m_paddsb` for `_mm_adds_pi8`
- `_m_paddsw` for `_mm_adds_pi16`
- `_m_paddusb` for `_mm_adds_pu8`
- `_m_paddusw` for `_mm_adds_pu16`
* Implement `_m_psubb`
* Implement `_m_psubw`
* Implement `_m_psubd`
* Implement `_m_psubsb`
* Implement `_m_psubsw`
* Implement `_m_psubusb`
* Implement `_m_psubusw`
* Have the subtraction intrinsic naming consistent with the addition ones
E.g. use `_mm_sub_pi8` instead of `_m_psubb`
* Implement all subtraction aliases for the `_mm_*` variants
- `_m_psubb` for `_mm_sub_pi8`
- `_m_psubw` for `_mm_sub_pi16`
- `_m_psubd` for `_mm_sub_pi32`
- `_m_psubsb` for `_mm_subs_pi8`
- `_m_psubsw` for `_mm_subs_pi16`
- `_m_psubusb` for `_mm_subs_pu8`
- `_m_psubusw` for `_mm_subs_pu16`
This commit starts the migration towards Intel's types one intrinsic at a time,
starting with `_mm_add_ss`. This is mostly just to get a feel for what the tests
will start to look like.
* [core/runtime] use getauxval on non-x86 platforms
* test coresimd::auxv against auxv crate
* add test files from auxv crate
* [arm] use simd_test macro
* formatting
* missing docs
* improve docs
* reading /proc/self/auxv succeeds only if reading all fields succeeds
* remove cc-crate build dependency
* getauxval succeeds only if hwcap/hwcap2 are non-zero
* fix formatting
* move getauxval to stdsimd
* delete getauxval-wrapper.c
* remove auxv crate dev-dependency from coresimd
* Completes SSE and adds some MMX intrinsics
MMX:
- `_mm_cmpgt_pi{8,16,32}`
- `_mm_unpack{hi,lo}_pi{8,16,32}`
SSE (is now complete):
- `_mm_cvtp{i,u}{8,16}_ps`
- add test for `_m_pmulhuw`
* fmt and clippy
* add an exception for intrinsics using cvtpi2ps
* assert_instr check for failed inlining
* Fix `call` instructions showing up in some intrinsics
The ABI of types like `u8x8` as they're defined isn't actually the underlying
type we need for LLVM, but only `__m64` currently satisfies that. Apparently
this (and the casts involved) caused some extraneous instructions for a number
of intrinsics. They've all moved over to the `__m64` type now to ensure that
they're what the underlying interface is.
* Allow PIC-relative `call` instructions on x86
These should be harmless when evaluating whether we failed inlining
* Fix sse::_mm_cvtsi32_ss and sse::_mm_cvtsi64_ss
By using LLVM builtins, the expected instruction
is correctly generated on all platforms.
* Use LLVM builtins for storeu*
Just to make sure that the wrong instructions is not related to
Rust code.
* sse: _mm_cvtpi16_ps, _mm_cvtpu16_ps, _mm_cvtpi8_ps, _mm_cvtpu8_ps
And mmx:
_mm_cmpgt_pi8
_mm_cmpgt_pi16
_mm_unpackhi_pi16
_mm_unpacklo_pi8
_mm_unpacklo_pi16
* Fix: literal out of range
This commit adds a new crate for testing that the intrinsics listed in this
crate do indeed match the upstream definition of each intrinsic. A
pre-downloaded XML description of all Intel intrinsics is checked in which is
then parsed in the `stdsimd-verify` crate to verify that everything we write
down is matched against the upstream definitions.
Currently the checks are pretty loose to get this compiling but a few intrinsics
were fixed as a result of this. For example:
* `_mm256_extract_epi8` - AVX2 intrinsic erroneously listed under AVX
* `_mm256_extract_epi16` - AVX2 intrinsic erroneously listed under AVX
* `_mm256_extract_epi32` - AVX2 intrinsic erroneously listed under AVX
* `_mm256_extract_epi64` - AVX2 intrinsic erroneously listed under AVX
* `_mm_tzcnt_32` - erroneously had `u32` in the name
* `_mm_tzcnt_64` - erroneously had `u64` in the name
* `_mm_cvtsi64_si128` - erroneously available on 32-bit platforms
* `_mm_cvtsi64x_si128` - erroneously available on 32-bit platforms
* `_mm_cvtsi128_si64` - erroneously available on 32-bit platforms
* `_mm_cvtsi128_si64x` - erroneously available on 32-bit platforms
* `_mm_extract_epi64` - erroneously available on 32-bit platforms
* `_mm_insert_epi64` - erroneously available on 32-bit platforms
* `_mm256_extract_epi16` - erroneously returned i32 instead of i16
* `_mm256_extract_epi8` - erroneously returned i32 instead of i8
* `_mm_shuffle_ps` - the mask argument was erroneously i32 instead of u32
* `_popcnt32` - the signededness of the argument and return were flipped
* `_popcnt64` - the signededness of the argument was flipped and the argument
was too large bit-wise
* `_mm_tzcnt_32` - the return value's sign was flipped
* `_mm_tzcnt_64` - the return value's sign was flipped
* A good number of intrinsics used `imm8: i8` or `imm8: u8` instead of `imm8:
i32` which Intel was using. (we were also internally inconsistent)
* A number of intrinsics working with `__m64` were instead working with i64/u64,
so they're now corrected to operate with the vector types instead.
Currently the verifications performed are:
* Each name in Rust is defined in the XML document
* The arguments/return values all agree.
* The CPUID features listed in the XML document are all enabled in Rust as well.
The type matching right now is pretty loose and has a lot of questionable
changes. Future commits will touch these up to be more strict and require closer
adherence with Intel's own types. Otherwise types like `i32x8` (or any integers
with 256 bits) all match up to `__m256i` right now, althoguh this may want to
change in the future.
Finally we're also not testing the instruction listed in the XML right now.
There's a huge number of discrepancies between the instruction listed in the XML
and the instruction listed in `assert_instr`, and those'll need to be taken care
of in a future commit.
Closes#240
* Adds extract_unchecked + replace_unchecked + len (#222 )
* [x86] Fixes the return types + uses extract_unchecked for:
* _mm_extract_epi8
* _mm_extract_epi16
* _mm256_extract_epi8
* _mm256_extract_epi16
* Minor changes to the other extract_epi* intrinsics for style consistency
These should now zero-extend the extracted int and behave appropriately. An old typo makes these a bit confusing, See this llvm issue.