This commit adds a new crate for testing that the intrinsics listed in this
crate do indeed match the upstream definition of each intrinsic. A
pre-downloaded XML description of all Intel intrinsics is checked in which is
then parsed in the `stdsimd-verify` crate to verify that everything we write
down is matched against the upstream definitions.
Currently the checks are pretty loose to get this compiling but a few intrinsics
were fixed as a result of this. For example:
* `_mm256_extract_epi8` - AVX2 intrinsic erroneously listed under AVX
* `_mm256_extract_epi16` - AVX2 intrinsic erroneously listed under AVX
* `_mm256_extract_epi32` - AVX2 intrinsic erroneously listed under AVX
* `_mm256_extract_epi64` - AVX2 intrinsic erroneously listed under AVX
* `_mm_tzcnt_32` - erroneously had `u32` in the name
* `_mm_tzcnt_64` - erroneously had `u64` in the name
* `_mm_cvtsi64_si128` - erroneously available on 32-bit platforms
* `_mm_cvtsi64x_si128` - erroneously available on 32-bit platforms
* `_mm_cvtsi128_si64` - erroneously available on 32-bit platforms
* `_mm_cvtsi128_si64x` - erroneously available on 32-bit platforms
* `_mm_extract_epi64` - erroneously available on 32-bit platforms
* `_mm_insert_epi64` - erroneously available on 32-bit platforms
* `_mm256_extract_epi16` - erroneously returned i32 instead of i16
* `_mm256_extract_epi8` - erroneously returned i32 instead of i8
* `_mm_shuffle_ps` - the mask argument was erroneously i32 instead of u32
* `_popcnt32` - the signededness of the argument and return were flipped
* `_popcnt64` - the signededness of the argument was flipped and the argument
was too large bit-wise
* `_mm_tzcnt_32` - the return value's sign was flipped
* `_mm_tzcnt_64` - the return value's sign was flipped
* A good number of intrinsics used `imm8: i8` or `imm8: u8` instead of `imm8:
i32` which Intel was using. (we were also internally inconsistent)
* A number of intrinsics working with `__m64` were instead working with i64/u64,
so they're now corrected to operate with the vector types instead.
Currently the verifications performed are:
* Each name in Rust is defined in the XML document
* The arguments/return values all agree.
* The CPUID features listed in the XML document are all enabled in Rust as well.
The type matching right now is pretty loose and has a lot of questionable
changes. Future commits will touch these up to be more strict and require closer
adherence with Intel's own types. Otherwise types like `i32x8` (or any integers
with 256 bits) all match up to `__m256i` right now, althoguh this may want to
change in the future.
Finally we're also not testing the instruction listed in the XML right now.
There's a huge number of discrepancies between the instruction listed in the XML
and the instruction listed in `assert_instr`, and those'll need to be taken care
of in a future commit.
Closes#240
* Adds extract_unchecked + replace_unchecked + len (#222 )
* [x86] Fixes the return types + uses extract_unchecked for:
* _mm_extract_epi8
* _mm_extract_epi16
* _mm256_extract_epi8
* _mm256_extract_epi16
* Minor changes to the other extract_epi* intrinsics for style consistency
These should now zero-extend the extracted int and behave appropriately. An old typo makes these a bit confusing, See this llvm issue.
* ssse3: _mm_abs_pi8 failing
Intrinsic has incorrect return type!
<8 x i8> (<8 x i8>)* @llvm.x86.ssse3.pabs.b
* Introduce a x86_mmx type
And make it compatible with i8x8 and u8x8.
Alex suggested to change the i8x8 declaration as:
```
struct i8x8(i64);
```
But I don't see how to make it compatible with the
existing code/macros.
* ssse3: _mm_abs_pi16, _mm_abs_pi32, _mm_shuffle_pi8
* ssse3: _mm_abs_pi16, _mm_abs_pi32, _mm_shuffle_pi8 tests
* Replace x86_mmx by __m64
* ssse3: _mm_sign_pi8, _mm_sign_pi16, _mm_sign_pi32
* ssse3: _mm_mulhrs_pi16
* ssse3: _mm_maddubs_pi16
* ssse3: _mm_hsub_pi16, _mm_hsub_pi32, _mm_hsubs_pi16
* ssse3: _mm_hadd_pi16, _mm_hadd_pi32, _mm_hadds_pi16
* Move some ssse3 intrinsics from i586 to i686
* update docs
* cargo clean deletes previous docs
* remove stdsimd from coresimd examples
* use stdsimd instead of coresimd in core docs
* add stdsimd as a dev-dependency of coresimd
* Enable a Cargo workspace for the repo
* Disable tests for proc-macro crates
* Move back to mounting source directory read-only
* Refactor test invocation to only test one crate with `--all`