Mrowqa
0c9ac36595
x86: implemented roundings for SSE4.1 ( #158 )
...
* x86: implemented roundings for SSE4.1
* x86: sse41 roundings - added docs and fixed assert__* tests
2017-10-28 16:32:14 -04:00
gnzlbg
46c6e9beb6
[fmt] use cargo fmt --all ( #161 )
2017-10-28 16:29:52 -04:00
gnzlbg
69d2ad85f3
[ci] check formatting ( #64 )
...
* [ci] check formatting
* [rustfmt] reformat the whole library
2017-10-27 11:55:29 -04:00
Mrowqa
5869eca3e9
x86: implemented _mm{,256}_maskstore_epi{32,64} ( #155 )
...
* x86: implemented maskloads for avx2
* x86: added docs and tests for avx2 maskloads
* x86: refactor - changed `a` to `mem_addr` in avx2 mask loads for consistency
* x86: implemented _mm{,256}_maskstore_epi{32,64}
2017-10-27 11:40:48 -04:00
Henry de Valence
1c67fc00e7
avx2: add _mm256_shuffle_epi32 reusing _mm_shuffle_epi32 code ( #156 )
2017-10-27 11:10:11 -04:00
gnzlbg
ad48780fca
[arm] vadd, vaddd, vaddq, vaddl
2017-10-26 10:18:00 -04:00
Mrowqa
ae0688c7fa
x86: fixed testing equality of floating point numbers ( #150 )
...
* x86: fixed testing equality of floating point numbers
* x86: removed unused macro branch
* x86: marked assert_approx_eq as used only in tests
2017-10-25 09:57:35 -04:00
gwenn
ea51cbcf25
avx: fix *si256 methods ( #145 )
...
* avx: fix *si256 methods
* avx: _mm256_setr_m128
* avx: _mm256_setr_m128d
* avx: _mm256_setr_m128i
* avx: _mm256_loadu2_m128
* avx: _mm256_loadu2_m128d
* avx: _mm256_loadu2_m128i
* avx: _mm256_storeu2_m128
* sse2: _mm_storeu_pd
* avx: _mm256_storeu2_m128d
* sse2: _mm_undefined_si128
* avx: _mm256_storeu2_m128i
* Try to fix i586 build
2017-10-25 01:26:19 -04:00
Henry de Valence
0f33ca5518
avx2: add _mm256_unpack{hi,lo}_epi{8,16,32,64} ( #147 )
2017-10-24 20:12:23 -04:00
gnzlbg
3e1e52f413
update readme and crates.io badges, categories, etc. ( #141 )
...
* [readme] badges
* [crates.io] add badges, categories, etc.
2017-10-23 08:37:41 -05:00
Steven Fackler
6f134c3dfa
Make vector constructors const functions ( #137 )
2017-10-23 08:35:43 -05:00
Thomas Schilling
8b6f5d183e
Add some SSE _mm_cvt* instructions ( #136 )
...
* Add single output _mm_cvt[t]ss_* variants
The *_pi variants are currently blocked by
https://github.com/rust-lang-nursery/stdsimd/issues/74
* Add _mm_cvtsi*_ss
The _mm_cvtpi*_ps intrinsics are blocked by
https://github.com/rust-lang-nursery/stdsimd/issues/74
* Fix Linux builds
Also the si64 variants are only available on x86_64
2017-10-23 08:35:28 -05:00
Steven Fackler
76d9b89ab2
Implement _mm256_permute4x64_epi64 ( #144 )
2017-10-23 08:35:03 -05:00
gnzlbg
1f44e3166e
Deny all warnings and fix errors ( #135 )
...
* [travis-ci] deny warnings
* fix all warnings
2017-10-22 12:30:26 -05:00
gnzlbg
8fa5e7bcf5
[travis-ci] allow testing on all branches ( #134 )
2017-10-22 07:43:48 -05:00
jneem
192c4ac4fd
avx2: signed extensions ( #132 )
...
_mm256_cvtepi8_epi16
_mm256_cvtepi8_epi32
_mm256_cvtepi8_epi64
_mm256_cvtepi16_epi32
_mm256_cvtepi16_epi64
_mm256_cvtepi32_epi64
2017-10-21 15:00:13 -05:00
Steven Fackler
5fb563aabc
Add _mm256_shuffle_epi8 and _mm256_permutevar8x32_epi32 ( #133 )
...
* Add _mm256_shuffle_epi8
* Add _mm256_permutevar8x32_epi32
2017-10-21 14:59:37 -05:00
pythoneer
d5fd2b09a7
sse2 ( #131 )
...
* added missing doc _mm_cvtps_pd
added missing doc & test _mm_load_pd
added missing doc & test _mm_store_pd
added _mm_store1_pd
added _mm_store_pd1
added _mm_storer_pd
added _mm_load_pd1
added _mm_loadr_pd
added _mm_loadu_pd
* correct alignments
2017-10-21 10:46:55 -05:00
jneem
3ec870078a
avx2: _mm256_blend_epi32 and _mm256_blend_epi16. ( #130 )
2017-10-18 17:29:23 -05:00
gnzlbg
a3a703d83e
[example] nbody ( #117 )
2017-10-18 17:19:19 -05:00
Dan Robertson
4782ffadee
[x86] Implement avx2 broadcast intrinsics ( #97 )
...
Implement
- _mm_broadcastb_epi8
- _mm256_broadcastb_epi8
- _mm_broadcastd_epi32
- _mm256_broadcastd_epi32
- _mm_bradcastq_epi64
- _mm256_broadcastq_epi64
- _mm_broadcastsd_pd
- _mm256_broadcastsd_pd
- _mm256_broadcastsi128_si256
- _mm_broadcastss_ps
- _mm256_broadcastss_ps
- _mm_broadcastw_epi16
- _mm256_broadcast2_epi16
2017-10-18 14:36:17 -05:00
Alex Crichton
7b249298c0
Uncomment _mm256_mpsadbw_epu8 ( #128 )
...
Just needed some `constify_imm8!` treatment
Closes #59
2017-10-18 13:17:09 -05:00
gnzlbg
2dc965b69a
[neon] reciprocal square-root estimate ( #121 )
2017-10-18 13:16:34 -05:00
Alex Crichton
13bc6b8517
Add CI in Intel's instruction emulator ( #113 )
...
This commit adds a new builder on CI for running tests in Intel's own emulator
and also adds an assertion that on this emulator no tests are skipped due to
missing CPU features by accident.
Closes #92
2017-10-18 11:35:11 -04:00
André Oliveira
02c89b24ba
sse4.1 instructions ( #98 )
...
* sse4.1: _mm_blendv_ps and _mm_blendv_pd
* sse4.1: _mm_blend_ps and _mm_blend_pd
- HACK warning: messing with the constify macros
- Selecting only one buffer gets optimized away and tests need to take this into account
* sse4.1: _mm_blend_epi16
* sse4.1: _mm_extract_ps
* sse4.1: _mm_extract_epi8
* see4.1: _mm_extract_epi32
* sse4.1: _mm_extract_epi64
* sse4.1: _mm_insert_ps
* sse4.1: _mm_insert_epi8
* sse4.1: _mm_insert_epi32 and _mm_insert_epi64
* Formmating
* sse4.1: _mm_max_epi8, _mm_max_epu16, _mm_max_epi32 and _mm_max_epu32
* Fix wrong compiler flag
- avx -> sse4.1
* Fix intrinsics that only work with x86-64
* sse4.1: use appropriate types
* Revert '_mm_extract_ps' to return i32
* sse4.1: Use the v128 types for consistency
* Try fix for windows
* Try "vectorcall" calling convention
* Revert "Try "vectorcall" calling convention"
This reverts commit 12936e9976bc6b0e4e538d82f55f0ee2d87a7f25.
* Revert "Try fix for windows"
This reverts commit 9c473808d334acedd46060b32ceea116662bf6a3.
* Change tests for windows
* Remove useless Windows test
2017-10-18 11:34:51 -04:00
jneem
acf919f960
avx2: _mm_blend_epi32 ( #127 )
2017-10-17 10:16:15 -04:00
Thomas Schilling
64c7f7ec56
Add SSE _mm_store* intrinsics and _mm_move_ss ( #115 )
...
* Add _mm_store* intrinsics and _mm_move_ss
* Fix Win64 & Linux i586 failures
* Make i586 codegen happy without breaking x86_64
2017-10-17 10:15:37 -04:00
gwenn
19e7d0ed3e
Avx ( #126 )
...
* avx: _mm256_zextps128_ps256
* avx: _mm256_zextpd128_pd256
* avx: _mm256_set_m128
* avx: _mm256_set_m128d
* avx: _mm256_castpd_ps
* avx: _mm256_castps_pd
* avx: _mm256_castps_si256
* avx: _mm256_castsi256_ps
* avx: _mm256_zextsi128_si256
* avx: _mm256_set_m128i
2017-10-16 18:14:09 -04:00
pythoneer
3286bbbab7
fixed _mm_set_pd and _mm_setr_pd by reversing order ( #124 )
2017-10-16 11:32:26 -04:00
gwenn
db8831ac61
Avx ( #123 )
...
* avx: _mm256_movedup_pd
* avx: _mm256_lddqu_si256
* avx: _mm256_rcp_ps
* avx: _mm256_rsqrt_ps
* avx: _mm256_unpackhi_pd
* avx: _mm256_unpackhi_ps
* avx: _mm256_unpacklo_pd, _mm256_unpacklo_ps
* avx: _mm256_testz_si256
* avx: _mm256_testc_si256
* avx: _mm256_testz_pd
* avx: _mm256_testc_pd
* avx: _mm256_testnzc_pd
* avx: _mm_testz_pd
* avx: _mm_testc_pd
* avx: _mm_testnzc_pd
* avx: _mm256_testz_ps, _mm256_testc_ps, _mm256_testnzc_ps
* avx: _mm_testz_ps, _mm_testc_ps, _mm_testnzc_ps
* avx: _mm256_movemask_pd, _mm256_movemask_ps
* avx: _mm256_setzero_pd, _mm256_setzero_ps
* avx: _mm256_setzero_si256
* avx: _mm256_set_pd, _mm256_set_ps
* avx: _mm256_set_epi8
* avx: _mm256_set_epi16
* avx: _mm256_set_epi32
* avx: _mm256_set_epi64x
* avx: _mm256_setr_pd, _mm256_setr_ps
* avx: _mm256_setr_epi8
* avx: _mm256_setr_epi16
* avx: _mm256_setr_epi32, _mm256_setr_epi64x
* avx: add missing assert_instr
* avx: _mm256_set1_pd
* avx: _mm256_set1_ps
* avx: _mm256_set1_epi8
* avx: _mm256_set1_epi16, _mm256_set1_epi32
* avx: _mm256_set1_epi64x
* avx: _mm256_castpd_si256, _mm256_castsi256_pd, _mm256_castps256_ps128, _mm256_castpd256_pd128, _mm256_castsi256_si128
* avx: remove assert_instr failing
2017-10-15 11:36:46 -04:00
gnzlbg
bd7990eb2a
[arm] v6/v7/v8 run-time tests ( #119 )
2017-10-15 09:48:06 -04:00
pythoneer
c38ea28d5a
Sse2 ( #122 )
...
* added _mm_cvtps_pd
* added _mm_set_sd
* added _mm_set1_pd
* added _mm_set_pd1
* added _mm_set_pd
* added _mm_setr_pd
* added _mm_setzero_pd
2017-10-14 19:50:03 -04:00
gwenn
90c0c9be20
Avx ( #109 )
...
* avx: _mm256_loadu_pd
* avx: _mm256_storeu_pd
* avx: _mm256_loadu_ps
* avx: _mm256_storeu_ps
* avx: fix _mm256_storeu_pd and _mm256_storeu_ps
* avx: _mm256_loadu_si256
* avx: _mm256_undefined_si256
* avx: _mm256_maskload_pd
* avx: _mm256_maskstore_pd
* Attempt to fix CI (#108 )
Need to bring codegen units back to only one for now
* [x86] sse4.2 add docs for _SIDD_EQUAL_RANGES (#107 )
- Add docs for the _SIDD_EQUAL_RANGES mode
* Add _MM_TRANSPOSE4_PS pseudo-macro. (#106 )
This adds a strange macro, which I've replaced with a function, because it
seems there are not many better alternatives.
Also adds a test, and `#[allow(non_snake_case)]` to `#[simd_test]`.
* Fix i586 tests
* Implement bitwise SSE ops & _mm_cmp*_ss (#103 )
* Add _mm_{and,andnot,or,xor}_ps
* Add _mm_cmpeq_ss
* Add _mm_cmplt_ss
* Add _mm_cmple_ss
* Add _mm_cmpgt_ss
* Add _mm_cmpge_ss
* Add _mm_cmpneq_ss
* Add _mm_cmpnlt_ss
* Add _mm_cmpnle_ss
* Add _mm_cmpngt_ss
* Add _mm_cmpnge_ss
* Add _mm_cmpord_ss
* Add _mm_cmpunord_ss
* Fix _mm_{and,andnot,or,xor}_ps tests for i586
LLVM for i586 doesn't seem to generate `andps`, and instead generates 4
`and`s. Similar for the other operations.
* avx: _mm_maskload_pd
* avx: _mm_maskstore_pd
* avx: _mm256_maskload_ps
* avx: _mm256_maskstore_ps
* avx: _mm_maskload_ps, _mm_maskstore_ps
* avx: _mm256_movehdup_ps
* avx: _mm256_moveldup_ps
2017-10-14 10:12:57 -04:00
pythoneer
4aa889fa67
Sse2 ( #116 )
...
* added _mm_cvtsd_si64
* added _mm_cvttsd_si64; target_arch to _mm_cvtsd_si64 test
2017-10-14 10:11:25 -04:00
Alex Crichton
082b097d8f
Ignore another test for nightly
...
Wait until rust-lang/rust#45202 is in nightly
2017-10-14 07:10:42 -07:00
Thomas Schilling
05b045746a
SSE Comparison instructions ( #111 )
...
* Add _mm_cmp*_ps variant (SSE)
* Add _mm_comi{eq,lt,le,gt,ge,neq}_ss instructions (sse)
* Add _mm_ucomi*_ss instructions SSE
They all compile down to the same x86 instruction, UCOMISS, whereas the
_mm_comi*_ss instructions compile down to COMISS. The outputs of both
sets of instructions are exactly the same. The only difference is in
exception handling. I therefore added a single test case which tests
their different effect on the MXCSR register (_mm_getcsr) of
_mm_comieq_ss vs. _mm_ucomieq_ss. Together with the tests about emitting
the right instruction, no tests further tests are needed for the other
variants.
* Avoid constant-folding test case
2017-10-12 13:47:21 -04:00
Thomas Schilling
9b0295c0f8
Implement bitwise SSE ops & _mm_cmp*_ss ( #103 )
...
* Add _mm_{and,andnot,or,xor}_ps
* Add _mm_cmpeq_ss
* Add _mm_cmplt_ss
* Add _mm_cmple_ss
* Add _mm_cmpgt_ss
* Add _mm_cmpge_ss
* Add _mm_cmpneq_ss
* Add _mm_cmpnlt_ss
* Add _mm_cmpnle_ss
* Add _mm_cmpngt_ss
* Add _mm_cmpnge_ss
* Add _mm_cmpord_ss
* Add _mm_cmpunord_ss
* Fix _mm_{and,andnot,or,xor}_ps tests for i586
LLVM for i586 doesn't seem to generate `andps`, and instead generates 4
`and`s. Similar for the other operations.
2017-10-12 10:15:10 -04:00
Alex Crichton
9a440a3eb0
Fix i586 tests
2017-10-11 17:33:41 -07:00
Malo Jaffré
5a028d329e
Add _MM_TRANSPOSE4_PS pseudo-macro. ( #106 )
...
This adds a strange macro, which I've replaced with a function, because it
seems there are not many better alternatives.
Also adds a test, and `#[allow(non_snake_case)]` to `#[simd_test]`.
2017-10-11 11:28:44 -04:00
Dan Robertson
5a6005aa29
[x86] sse4.2 add docs for _SIDD_EQUAL_RANGES ( #107 )
...
- Add docs for the _SIDD_EQUAL_RANGES mode
2017-10-11 11:28:17 -04:00
Alex Crichton
9da400965f
Attempt to fix CI ( #108 )
...
Need to bring codegen units back to only one for now
2017-10-11 11:28:02 -04:00
gwenn
7c88f7c49b
Avx ( #105 )
...
* avx: _mm_permute_ps and sse: _mm_undefined_ps
* avx: _mm256_permutevar_pdi, _mm_permutevar_pd
* avx: _mm256_permute_pd
* avx: _mm256_shuffle_pd fixed
* avx: _mm_permute_pd, sse2: _mm_undefined_pd
* avx: _mm256_permute2f128_ps
* avx: _mm256_permute2f128_pd
* avx: _mm256_permute2f128_si256
* avx: _mm256_broadcast_ss
* avx: _mm_broadcast_ss
* avx: _mm256_broadcast_sd
* avx: _mm256_broadcast_ps
* avx: _mm256_broadcast_pd
* avx: _mm_cmp_pd
* avx: _mm256_cmp_pd
* avx: _mm_cmp_ps
* avx: _mm256_cmp_ps
* avx: _mm_cmp_sd
* avx: _mm_cmp_ss
* avx: _mm256_insertf128_pd, _mm256_castpd128_pd256
* avx: _mm256_insertf128_si256, _mm256_castsi128_si256
* avx: _mm256_insertf128_ps, _mm256_castps128_ps256
* avx: _mm256_insert_epi8
* avx: _mm256_insert_epi16
* avx: _mm256_insert_epi32
* avx: _mm256_insert_epi64
* Try to fix i586 build
* Fix missing inline and target_feature
* sse: fix _mm_undefined_ps
2017-10-09 16:05:36 -05:00
Thomas Schilling
807ec089b7
Implement SSE _mm_load* instructions ( #99 )
...
* Add _mm_loadh_pi
* Add doctest for _mm_loadh_pi
* Add _mm_loadl_pi
* Add _mm_load_ss
* Add _mm_load1_ps and _mm_load_ps1
* Add _mm_load_ps and _mm_loadu_ps
* Add _mm_loadr_ps
* Replace _mm_loadu_ps TODO with explanation
* Tweak expected instructions for _mm_loadl/h_pi on x86
* Try fixing i586 test crash
* Targets i586/i686 generate different code for _mm_loadh_pi
2017-10-07 21:12:47 -05:00
Thomas Schilling
a547f2bf36
Implement SSE _mm_set* intrinsics ( #100 )
...
* Add _mm_set_ss
* Add _mm_set1_ps and _mm_set_ps1
* Add _mm_set_ps
* Add _mm_setr_ps
* Add _mm_setzero_ps
* Fix _mm_setr_ps instr test on x86
* Sidestep black_box ABI issue on i586
2017-10-07 15:04:55 +00:00
Alex Crichton
7055f496c7
Add an i586 builder ( #101 )
...
The i586 targets on x86 are defined to be 32-bit and lacking in sse/sse2 unlike
the i686 target which has sse2 turned on by default. I was mostly curious what
would happen when turning on this target, and it turns out quite a few tests
failed!
Most of the tests here had to do with calling functions with ABI mismatches
where the callee wasn't `#[inline(always)]`. Various pieces have been updated
now and we should be passing all tests.
Only one instruction assertion ended up changing where the function generates a
different instruction with sse2 ambiently enabled and without it enabled.
2017-10-06 22:54:18 +00:00
Alex Crichton
40eeae6adf
Enable multiple #[assert_instr] attributes ( #96 )
...
* Enable multiple #[assert_instr] attributes
Looks like all we needed to do was generate new function names!
* Uncomment assertions for `_mm_prefetch`
2017-10-06 21:19:14 +00:00
gwenn
ee0f165e8e
Avx ( #90 )
...
* avx: _mm256_andnot_pd, _mm256_andnot_ps
* avx: _mm256_blendv_pd
* avx: _mm256_blend_pd with no assert_instr
With assert_instr: too many instructions in the disassembly
* avx: _mm256_blendv_ps
* avx: _mm256_hadd_pd
* avx: _mm256_hadd_ps
* avx: _mm256_hsub_pd
* avx: _mm256_hsub_ps
* avx: _mm256_xor_pd
* avx: _mm256_xor_ps
* avx: _mm256_cvtepi32_pd
* avx: _mm256_cvtepi32_ps
* avx: _mm256_cvtpd_ps
* avx: _mm256_cvtps_epi32
* avx: _mm256_cvtps_pd
* avx: _mm256_cvttpd_epi32
* avx: _mm256_cvtpd_epi32
* avx: replace simd_cast by proper instrunction
* avx: _mm256_cvttps_epi32
* avx: _mm256_extractf128_ps, _mm256_undefined_ps
* avx: _mm256_extractf128_pd, _mm256_undefined_pd
* avx: _mm256_extractf128_si256, _mm256_undefined_si256
* avx: _mm256_extract_epi8
* avx: _mm256_extract_epi16
* avx: _mm256_extract_epi32
* avx: _mm256_extract_epi64
* avx: _mm256_zeroall
* avx: _mm256_zeroupper
* avx: _mm256_permutevar_ps
* avx: _mm_permutevar_ps
* avx: replace simd_cast by as_*
* avx: _mm256_permute_ps
* avx: _mm256_dp_ps
* avx: _mm256_shuffle_pd
* avx: _mm256_shuffle_pd, wrong instruction generated
* implement _mm256_hadd_ps and _mm256_hadd_pd
* avx: implement _mm256_hsub_pd and _mm256_hsub_ps
* assert_instr: raise the limit up to 30 instructions
2017-10-05 13:42:29 -05:00
Dan Robertson
b421e9210c
[Docs] Add more docs to the sse4.2 cmpstr fns ( #94 )
...
- Add more examples to _mm_cmpistri
- Add basic docs to _mm_cmpestri
- Cleanup lib docs
2017-10-05 18:26:40 +02:00
Thomas Schilling
186b8fe093
Implement _mm_getcsr, _mm_setcsr, _mm_sfence ( #88 )
...
* Add _mm_sfence
* Add _mm_getcsr/_mm_setcsr and convenience wrappers
* Use test::black_box to simplify tests
* Use uppercase naming for C-macro equivalents
Discussed at https://github.com/rust-lang-nursery/stdsimd/issues/84
2017-10-05 18:17:43 +02:00
Thomas Schilling
c845a1baaf
Implement _mm_prefetch ( #78 )
...
This boils down to using LLVMs `prefetch` intrinsic [1].
[1]: https://llvm.org/docs/LangRef.html#llvm-prefetch-intrinsic
2017-10-05 18:08:58 +02:00