Commit Graph

2551 Commits

Author SHA1 Message Date
gwenn
17edf649af Fix some assert_instr (#254)
* Fix some assert_instr

Missing assert_instr:
- _mm_cvtsi32_si128
- _mm_cvtsi128_si32
- _mm_loadl_epi64
- _mm_storel_epi64
- _mm_move_epi64
- _mm_cvtsd_f64
- _mm_setzero_pd
- _mm_load1_pd
- _mm_load_pd1
- _mm_loaddup_pd

Wrong intrusction used:
- _mm_hsub_pi16

* Try to fix CI build by disabling some asserts

* Exclude some assert_instr on (x86_64, linux)
2017-12-30 11:19:00 -06:00
Alex Crichton
be461b1377 Verify Intel intrinsics against upstream definitions (#251)
This commit adds a new crate for testing that the intrinsics listed in this
crate do indeed match the upstream definition of each intrinsic. A
pre-downloaded XML description of all Intel intrinsics is checked in which is
then parsed in the `stdsimd-verify` crate to verify that everything we write
down is matched against the upstream definitions.

Currently the checks are pretty loose to get this compiling but a few intrinsics
were fixed as a result of this. For example:

* `_mm256_extract_epi8` - AVX2 intrinsic erroneously listed under AVX
* `_mm256_extract_epi16` - AVX2 intrinsic erroneously listed under AVX
* `_mm256_extract_epi32` - AVX2 intrinsic erroneously listed under AVX
* `_mm256_extract_epi64` - AVX2 intrinsic erroneously listed under AVX
* `_mm_tzcnt_32` - erroneously had `u32` in the name
* `_mm_tzcnt_64` - erroneously had `u64` in the name
* `_mm_cvtsi64_si128` - erroneously available on 32-bit platforms
* `_mm_cvtsi64x_si128` - erroneously available on 32-bit platforms
* `_mm_cvtsi128_si64` - erroneously available on 32-bit platforms
* `_mm_cvtsi128_si64x` - erroneously available on 32-bit platforms
* `_mm_extract_epi64` - erroneously available on 32-bit platforms
* `_mm_insert_epi64` - erroneously available on 32-bit platforms
* `_mm256_extract_epi16` - erroneously returned i32 instead of i16
* `_mm256_extract_epi8` - erroneously returned i32 instead of i8
* `_mm_shuffle_ps` - the mask argument was erroneously i32 instead of u32
* `_popcnt32` - the signededness of the argument and return were flipped
* `_popcnt64` - the signededness of the argument was flipped and the argument
  was too large bit-wise
* `_mm_tzcnt_32` - the return value's sign was flipped
* `_mm_tzcnt_64` - the return value's sign was flipped
* A good number of intrinsics used `imm8: i8` or `imm8: u8` instead of `imm8:
  i32` which Intel was using. (we were also internally inconsistent)
* A number of intrinsics working with `__m64` were instead working with i64/u64,
  so they're now corrected to operate with the vector types instead.

Currently the verifications performed are:

* Each name in Rust is defined in the XML document
* The arguments/return values all agree.
* The CPUID features listed in the XML document are all enabled in Rust as well.

The type matching right now is pretty loose and has a lot of questionable
changes. Future commits will touch these up to be more strict and require closer
adherence with Intel's own types. Otherwise types like `i32x8` (or any integers
with 256 bits) all match up to `__m256i` right now, althoguh this may want to
change in the future.

Finally we're also not testing the instruction listed in the XML right now.
There's a huge number of discrepancies between the instruction listed in the XML
and the instruction listed in `assert_instr`, and those'll need to be taken care
of in a future commit.

Closes #240
2017-12-29 11:52:27 -06:00
gwenn
44a168a0b8 sse2: implements last remaining intrinsics (#244)
* sse2: __m64 related intrinsics

_mm_add_si64
_mm_mul_su32
_mm_sub_si64
_mm_cvtpi32_pd
_mm_set_epi64
_mm_set1_epi64
_mm_setr_epi64

* sse2: _mm_load_sd, _mm_loadh_pd, _mm_loadl_pd

* sse2: _mm_store_sd, _mm_storeh_pd, _mm_storel_pd

* sse2: _mm_shuffle_pd, _mm_move_sd

* sse2: _mm_cast*

_mm_castpd_ps
_mm_castpd_si128
_mm_castps_pd
_mm_castps_si128
_mm_castsi128_pd
_mm_castsi128_ps

* sse2: add some tests

* Try to fix AppVeyor build

* sse2: add more tests

* sse2: fix assert_instr for _mm_shuffle_pd

* Try to fix Travis build

* sse2: try to fix AppVeyor build

* sse2: try to fix AppVeyor build
2017-12-28 10:22:08 -06:00
Jonathan Goodman
3857c3e88a fix sse4a _mm_stream_{ss, sd} tests and docs 2017-12-27 22:32:49 +01:00
Alex Crichton
9aa4e30859 Update to syn master 2017-12-27 07:56:38 -08:00
gnzlbg
42ec76a3ff [sse4a] implement non-immediate-mode intrinsics (#249) 2017-12-22 10:14:41 -06:00
gnzlbg
1db6841813 [fmt] --force rustfmt-nightly 2017-12-22 00:24:23 +01:00
gnzlbg
52cc1abe2c [fmt] remove fn_call_width option (was removed upstream) 2017-12-22 00:24:23 +01:00
gnzlbg
5850282a1c use repr(align) to ensure proper alignment in tests 2017-12-22 00:24:23 +01:00
gnzlbg
4fb9420acb Fix rustfmt (#239)
* [fmt] manually fix some formatting
* [fmt] reformat with rustfmt-nightly
* [clippy] fix clippy issues
2017-12-14 19:57:53 +01:00
gnzlbg
5ce0c13009 [ci] powerpc/powerpc64/powerpc64le (#237)
* [ci] add powerpc/powerpc64 build bots

* unbreak stdsimd builds for targets without run-time
2017-12-14 10:44:20 -06:00
Tony Sifkarovski
645008ef32 Add unchecked methods, fix _mm_extract_epi* return types (#223)
* Adds extract_unchecked + replace_unchecked + len (#222 )

* [x86] Fixes the return types + uses extract_unchecked for:
  * _mm_extract_epi8
  * _mm_extract_epi16
  * _mm256_extract_epi8
  * _mm256_extract_epi16

* Minor changes to the other extract_epi* intrinsics for style consistency
These should now zero-extend the extracted int and behave appropriately. An old typo makes these a bit confusing, See this llvm issue.
2017-12-13 19:17:33 +01:00
gnzlbg
6e678ee678 fix clippy warnings 2017-12-13 10:19:09 -05:00
gnzlbg
84e2c7f8e4 fix __m64 imports 2017-12-13 10:19:09 -05:00
gnzlbg
9a81140e00 use i64s for the repr of __m{128,256}i and update casts 2017-12-13 10:19:09 -05:00
gnzlbg
1b987bd270 remove unnecessary mem::uninitialized 2017-12-13 10:19:09 -05:00
gnzlbg
45f1e63e15 remove unnecessary fixme 2017-12-13 10:19:09 -05:00
gnzlbg
878fd5b5d9 [avx] document intrinsics that don't correspond to an instruction 2017-12-13 10:19:09 -05:00
gnzlbg
baab3ad7f1 move __m256i to the v256 module 2017-12-13 10:19:09 -05:00
gnzlbg
ae6cff53c7 rework impl of __m64 and __m128i 2017-12-13 10:19:09 -05:00
gnzlbg
dd9a3f92ff move __m128i to the v128 module 2017-12-13 10:19:09 -05:00
gnzlbg
5fb068f74c move __m64 to the v64 module 2017-12-13 10:19:09 -05:00
gnzlbg
8c13c1e4a3 [ssse3] _mm_alignr_pi8 (#235) 2017-12-12 12:57:22 -06:00
Luca Barbato
baace2fc3f Initial PowerPC support
Rely mainly on parsing auxv since the cpuinfo information is incomplete.
2017-12-12 11:54:49 +01:00
Luca Barbato
f775bf3931 Extract the cpu capabilities from the auxiliary vector
Check for neon/asimd and pmull for arm and aarch64.
2017-12-12 11:54:49 +01:00
Luca Barbato
f49009e22c Unbreak detect_features for arm and aarch64 2017-12-12 11:54:49 +01:00
gwenn
4950bfed1a sse2: _mm_stream_* (#228)
* sse2: _mm_stream_si128,si32,pd,si64

* sse2: _mm_stream_* tests

* Disable assert_instr for _mm_stream_si64
2017-12-10 09:11:03 -06:00
gwenn
c2e4bb2e4c sse: __m64 related intrinsics (#230)
* sse: add missing aliases

_m_pextrw, _m_pinsrw, _m_pmovmskb, _m_pshufw

* sse: _mm_maskmove_si64, _m_maskmovq

* sse: _mm_mulhi_pu16, _m_pmulhuw

* sse: _mm_avg_pu8, _m_pavgb

* sse: _mm_avg_pu16, _m_pavgw

* sse: _mm_sad_pu8, _m_psadbw

* sse: _mm_cvtpi32_ps

* sse: _mm_cvtpi32x2_ps
2017-12-10 09:04:02 -06:00
gwenn
cbd52b05c1 Sse (#225)
* sse: _mm_cvt_pi2ps

* sse: _mm_extract_pi16

* sse: _mm_insert_pi16

* sse: _mm_movemask_pi8

* sse: _mm_shuffle_pi16

* sse: fix _mm_insert_pi16 and _mm_extract_pi16

* sse: add tests
2017-12-09 11:20:44 -06:00
gwenn
81630ea994 avx: _mm256_stream_si256, _mm256_stream_pd, _mm256_stream_ps (#227) 2017-12-09 11:20:30 -06:00
gwenn
0f53193641 sse2: _mm_movepi64_pi64, _mm_movpi64_epi64, _mm_cvtpd_pi32, _mm_cvttpd_pi32 2017-12-09 08:23:41 -05:00
gwenn
fcf106e685 ssse3 (#224)
* ssse3: _mm_abs_pi8 failing

Intrinsic has incorrect return type!
<8 x i8> (<8 x i8>)* @llvm.x86.ssse3.pabs.b

* Introduce a x86_mmx type

And make it compatible with i8x8 and u8x8.
Alex suggested to change the i8x8 declaration as:
```
struct i8x8(i64);
```
But I don't see how to make it compatible with the
existing code/macros.

* ssse3: _mm_abs_pi16, _mm_abs_pi32, _mm_shuffle_pi8

* ssse3: _mm_abs_pi16, _mm_abs_pi32, _mm_shuffle_pi8 tests

* Replace x86_mmx by __m64

* ssse3: _mm_sign_pi8, _mm_sign_pi16, _mm_sign_pi32

* ssse3: _mm_mulhrs_pi16

* ssse3: _mm_maddubs_pi16

* ssse3: _mm_hsub_pi16, _mm_hsub_pi32, _mm_hsubs_pi16

* ssse3: _mm_hadd_pi16, _mm_hadd_pi32, _mm_hadds_pi16

* Move some ssse3 intrinsics from i586 to i686
2017-12-03 11:53:36 -06:00
gnzlbg
6461312210 [ci] test i686-apple-darwin (#221)
* [ci] test i686-apple-darwin

* fix overflow on i686-apple-darwin
2017-11-28 17:09:38 -07:00
gnzlbg
8a92a566c9 [sse] _mm_stream_{ps,pi} (#219) 2017-11-28 07:48:26 -08:00
gnzlbg
288a30a93e add mmx module, mmx run-time detection, intrinsics (#220)
* [sse] _mm_cvtps_pi32, _mm_cvt_ps2pi

* [mmx] run-time detection support

* [x86] add mmx module

* [x86] make __m64 public

* [sse] add _mm_cvtps_pi{8,16}, _mm_cvttps_pi32, _mm_cvtt_ps2pi

* move new intrinsics from i586 to i686 module

* mmx requires i686
2017-11-28 07:45:41 -08:00
gnzlbg
ef847ac83b [sse] add _mm_{min, max}_{pi16, pu8} (#218)
* [sse] add _mm_{min, max}_{pi16, pu8}

* format docs
2017-11-27 14:54:28 -08:00
gnzlbg
b8a4b397ad update docs (#217)
* update docs

* cargo clean deletes previous docs

* remove stdsimd from coresimd examples

* use stdsimd instead of coresimd in core docs

* add stdsimd as a dev-dependency of coresimd
2017-11-27 10:47:23 -08:00
Tony Sifkarovski
40a0b1cc92 [avx2] add shuffle, insert/extract i128, permute* (#210)
* [x86][avx2] add _mm256_shuffle{hi,lo}_epi16
* [x86][avx2] add _mm256_{insert,extract}i128_si256
* [x86][avx2] add remaining permute intrinsics
2017-11-26 17:40:26 +01:00
gnzlbg
426621f021 Add FXSAVE/FXRSTOR, update Intel SDE, fix xsave tests (#205)
* [x86] add run-time detection for fxsr
* [x86] add i386 fxsr intrinsics: FXSAVE,FXRSTOR
* [x86_64] add x86_64 fxsr intrinsics: FXSAVE64/FXRSTOR64
* [x86-runtime]: document xsave detection further
* [x86] disable xsaves and xsaves64 tests
2017-11-22 15:25:15 +01:00
gnzlbg
20529701d8 Fix clippy and rust-fmt. 2017-11-22 13:42:58 +01:00
Alex Crichton
922345c005 Use workspaces and fix tests
* Enable a Cargo workspace for the repo
* Disable tests for proc-macro crates
* Move back to mounting source directory read-only
* Refactor test invocation to only test one crate with `--all`
2017-11-22 13:42:58 +01:00
gnzlbg
86fa377cea Only coresimd depends on stdsimd-test. 2017-11-22 13:42:58 +01:00
gnzlbg
cb9888f802 [ci] flag the documentation build bot 2017-11-22 13:42:58 +01:00
gnzlbg
b940d3311a fix doc script 2017-11-22 13:42:58 +01:00
gnzlbg
6a0a55f01a c_void -> *mut u8 2017-11-22 13:42:58 +01:00
gnzlbg
14d0903309 refactor no_std components into the coresimd crate 2017-11-22 13:42:58 +01:00
Adam Niederer
dc9f076480 Add AVX2 gathers (#202)
* Add _mm_[mask_]gatheri32_epi32
* Add _mm[256][_mask]_i32gather_{epi64, pd}
* Add _mm[256][_mask]_gather_ps
* Add _mm[256][_mask]_i64gather_{epi32, epi64, ps, pd}
2017-11-22 09:26:12 +01:00
gnzlbg
2faf11ab44 [readme] point always to latests docs (#206) 2017-11-21 15:05:46 -06:00
gnzlbg
0129d3be76 [nvptx] enable nvptx only when all other targets are disabled (#208)
Closes #207 .
2017-11-21 15:05:05 -06:00
Alex Crichton
8356754fe7 Fix hygiene in various macros (#204) 2017-11-21 12:54:06 -06:00