Commit Graph

2464 Commits

Author SHA1 Message Date
Alex Crichton
9a440a3eb0 Fix i586 tests 2017-10-11 17:33:41 -07:00
Malo Jaffré
5a028d329e Add _MM_TRANSPOSE4_PS pseudo-macro. (#106)
This adds a strange macro, which I've replaced with a function, because it
seems there are not many better alternatives.

Also adds a test, and `#[allow(non_snake_case)]` to `#[simd_test]`.
2017-10-11 11:28:44 -04:00
Dan Robertson
5a6005aa29 [x86] sse4.2 add docs for _SIDD_EQUAL_RANGES (#107)
- Add docs for the _SIDD_EQUAL_RANGES mode
2017-10-11 11:28:17 -04:00
Alex Crichton
9da400965f Attempt to fix CI (#108)
Need to bring codegen units back to only one for now
2017-10-11 11:28:02 -04:00
gwenn
7c88f7c49b Avx (#105)
* avx: _mm_permute_ps and sse: _mm_undefined_ps

* avx: _mm256_permutevar_pdi, _mm_permutevar_pd

* avx: _mm256_permute_pd

* avx: _mm256_shuffle_pd fixed

* avx: _mm_permute_pd, sse2: _mm_undefined_pd

* avx: _mm256_permute2f128_ps

* avx: _mm256_permute2f128_pd

* avx: _mm256_permute2f128_si256

* avx: _mm256_broadcast_ss

* avx: _mm_broadcast_ss

* avx: _mm256_broadcast_sd

* avx: _mm256_broadcast_ps

* avx: _mm256_broadcast_pd

* avx: _mm_cmp_pd

* avx: _mm256_cmp_pd

* avx: _mm_cmp_ps

* avx: _mm256_cmp_ps

* avx: _mm_cmp_sd

* avx: _mm_cmp_ss

* avx: _mm256_insertf128_pd, _mm256_castpd128_pd256

* avx: _mm256_insertf128_si256, _mm256_castsi128_si256

* avx: _mm256_insertf128_ps, _mm256_castps128_ps256

* avx: _mm256_insert_epi8

* avx: _mm256_insert_epi16

* avx: _mm256_insert_epi32

* avx: _mm256_insert_epi64

* Try to fix i586 build

* Fix missing inline and target_feature

* sse: fix _mm_undefined_ps
2017-10-09 16:05:36 -05:00
Thomas Schilling
807ec089b7 Implement SSE _mm_load* instructions (#99)
* Add _mm_loadh_pi

* Add doctest for _mm_loadh_pi

* Add _mm_loadl_pi

* Add _mm_load_ss

* Add _mm_load1_ps and _mm_load_ps1

* Add _mm_load_ps and _mm_loadu_ps

* Add _mm_loadr_ps

* Replace _mm_loadu_ps TODO with explanation

* Tweak expected instructions for _mm_loadl/h_pi on x86

* Try fixing i586 test crash

* Targets i586/i686 generate different code for _mm_loadh_pi
2017-10-07 21:12:47 -05:00
Thomas Schilling
a547f2bf36 Implement SSE _mm_set* intrinsics (#100)
* Add _mm_set_ss

* Add _mm_set1_ps and _mm_set_ps1

* Add _mm_set_ps

* Add _mm_setr_ps

* Add _mm_setzero_ps

* Fix _mm_setr_ps instr test on x86

* Sidestep black_box ABI issue on i586
2017-10-07 15:04:55 +00:00
Alex Crichton
7055f496c7 Add an i586 builder (#101)
The i586 targets on x86 are defined to be 32-bit and lacking in sse/sse2 unlike
the i686 target which has sse2 turned on by default. I was mostly curious what
would happen when turning on this target, and it turns out quite a few tests
failed!

Most of the tests here had to do with calling functions with ABI mismatches
where the callee wasn't `#[inline(always)]`. Various pieces have been updated
now and we should be passing all tests.

Only one instruction assertion ended up changing where the function generates a
different instruction with sse2 ambiently enabled and without it enabled.
2017-10-06 22:54:18 +00:00
Alex Crichton
40eeae6adf Enable multiple #[assert_instr] attributes (#96)
* Enable multiple #[assert_instr] attributes

Looks like all we needed to do was generate new function names!

* Uncomment assertions for `_mm_prefetch`
2017-10-06 21:19:14 +00:00
gwenn
ee0f165e8e Avx (#90)
* avx: _mm256_andnot_pd, _mm256_andnot_ps

* avx: _mm256_blendv_pd

* avx: _mm256_blend_pd with no assert_instr

With assert_instr: too many instructions in the disassembly

* avx: _mm256_blendv_ps

* avx: _mm256_hadd_pd

* avx: _mm256_hadd_ps

* avx: _mm256_hsub_pd

* avx: _mm256_hsub_ps

* avx: _mm256_xor_pd

* avx: _mm256_xor_ps

* avx: _mm256_cvtepi32_pd

* avx: _mm256_cvtepi32_ps

* avx: _mm256_cvtpd_ps

* avx: _mm256_cvtps_epi32

* avx: _mm256_cvtps_pd

* avx: _mm256_cvttpd_epi32

* avx: _mm256_cvtpd_epi32

* avx: replace simd_cast by proper instrunction

* avx: _mm256_cvttps_epi32

* avx: _mm256_extractf128_ps, _mm256_undefined_ps

* avx: _mm256_extractf128_pd, _mm256_undefined_pd

* avx: _mm256_extractf128_si256, _mm256_undefined_si256

* avx: _mm256_extract_epi8

* avx: _mm256_extract_epi16

* avx: _mm256_extract_epi32

* avx: _mm256_extract_epi64

* avx: _mm256_zeroall

* avx: _mm256_zeroupper

* avx: _mm256_permutevar_ps

* avx: _mm_permutevar_ps

* avx: replace simd_cast by as_*

* avx: _mm256_permute_ps

* avx: _mm256_dp_ps

* avx: _mm256_shuffle_pd

* avx: _mm256_shuffle_pd, wrong instruction generated

* implement _mm256_hadd_ps and _mm256_hadd_pd

* avx: implement _mm256_hsub_pd and _mm256_hsub_ps

* assert_instr: raise the limit up to 30 instructions
2017-10-05 13:42:29 -05:00
Dan Robertson
b421e9210c [Docs] Add more docs to the sse4.2 cmpstr fns (#94)
- Add more examples to _mm_cmpistri
 - Add basic docs to _mm_cmpestri
 - Cleanup lib docs
2017-10-05 18:26:40 +02:00
Thomas Schilling
186b8fe093 Implement _mm_getcsr, _mm_setcsr, _mm_sfence (#88)
* Add _mm_sfence

* Add _mm_getcsr/_mm_setcsr and convenience wrappers

* Use test::black_box to simplify tests

* Use uppercase naming for C-macro equivalents

Discussed at https://github.com/rust-lang-nursery/stdsimd/issues/84
2017-10-05 18:17:43 +02:00
Thomas Schilling
c845a1baaf Implement _mm_prefetch (#78)
This boils down to using LLVMs `prefetch` intrinsic [1].

[1]: https://llvm.org/docs/LangRef.html#llvm-prefetch-intrinsic
2017-10-05 18:08:58 +02:00
Adam Niederer
9695f2cfaf Improve _mm256_round_* docs (#93)
Fix a grammatical error, use a list instead of using a code block or nothing,
and add the LLVM immediate reference.
2017-10-05 00:25:18 +02:00
pythoneer
9a6176723b added _mm_cvttps_epi32 (#89) 2017-10-04 11:16:53 +02:00
Dan Robertson
c1da3bad76 [Docs] Improve documentation (#87)
- Add "How to write and example" section to CONTRIBUTING.md
 - Add a basic example using `target_feature` to the main page
 - Improve documentation of SSE 4.2
   - Improve documentation of constants
   - Improve documentation of _mm_cmpistri
2017-10-04 11:15:39 +02:00
gwenn
3202558c98 avx2: _mm256_alignr_epi8 2017-09-30 11:27:15 -04:00
gwenn
be7f29da03 Fix rustdoc 2017-09-30 11:27:15 -04:00
gwenn
d1dff51d90 ssse3: _mm_alignr_epi8 2017-09-30 11:27:15 -04:00
Dustin Bensing
fa2e02af28 added _mm_cvtsd_si32, _mm_cvtsd_ss, _mm_cvtss_sd, _mm_cvttpd_epi32, _mm_cvttsd_si32 2017-09-30 11:27:05 -04:00
Dan Robertson
7a75303aec [x86] Implement sse4.2 crc32 functions
- Implement
   - _mm_crc32_u8
   - _mm_crc32_u16
   - _mm_crc32_u32
   - _mm_crc32_u64
   - _mm_cmpgt_epi64
2017-09-30 09:53:34 -04:00
gwenn
b6a3bc42b3 Remove some failing assert_instr 2017-09-30 09:13:18 -04:00
gwenn
f0f5108a98 sse3: _mm_loaddup_pd and sse2: _mm_load1_pd 2017-09-30 09:13:18 -04:00
gwenn
4cbb838e2e sse3: _mm_moveldup_ps 2017-09-30 09:13:18 -04:00
gwenn
261534cb0f sse3: _mm_movehdup_ps 2017-09-30 09:13:18 -04:00
gwenn
8e07404403 sse3: _mm_movedup_pd 2017-09-30 09:13:18 -04:00
gwenn
e4ffcb6fdd sse3: _mm_hsub_ps 2017-09-30 09:13:18 -04:00
gwenn
d81d0a4a67 sse3: _mm_hsub_pd 2017-09-30 09:13:18 -04:00
gwenn
7f84607f16 sse3: _mm_hadd_ps 2017-09-30 09:13:18 -04:00
gwenn
fbd3416f0c sse3: _mm_hadd_pd 2017-09-30 09:13:18 -04:00
gwenn
dc684dc221 sse3: _mm_addsub_pd 2017-09-30 09:13:18 -04:00
gwenn
fff98467f3 sse3: _mm_addsub_ps 2017-09-30 09:13:18 -04:00
gwenn
b5a28bad22 sse3: _mm_lddqu_si128 2017-09-30 09:13:18 -04:00
Andrew Gallant
dfc7bef6cc add note about release mode in tests 2017-09-30 08:44:28 -04:00
Dan Robertson
5adea8cc03 Implement the sse4.2 string comparison intrinsics (#70)
* Docs: Fix typo in module documentation

s/paltform/platform/g

* [x86] Implement sse4.2 string cmp intrinsics

 - Implement
   - _mm_cmpistrm
   - _mm_cmpistri
   - _mm_cmpistrz
   - _mm_cmpistrc
   - _mm_cmpistrs
   - _mm_cmpistro
   - _mm_cmpistra
   - _mm_cmpestrm
   - _mm_cmpestrz
   - _mm_cmpestrc
   - _mm_cmpestrs
   - _mm_cmpestro
   - _mm_cmpestra
 - Add documentation to _mm_cmpestri
 - Add missing constants
2017-09-30 07:35:37 +00:00
Vincent Barrielle
44d1343cb0 avx: add _mm256_div_pd, _mm256_div_ps 2017-09-29 11:53:02 -04:00
André Oliveira
d23da170d5 Match clang's code unsigned implementation for consistency 2017-09-29 11:42:27 -04:00
André Oliveira
6a081164bb Reorder imports 2017-09-29 11:42:27 -04:00
André Oliveira
790087c0fb Fix 'assert_*' tests by using the single precision instruction 2017-09-29 11:42:27 -04:00
André Oliveira
f2cbe79265 Remove define_from! hack and use mem::transmute directly 2017-09-29 11:42:27 -04:00
André Oliveira
9ad5c4e88a avx: add vandpd, vandps, vorps and vorpd
- HACK Warning: Add from impls for u64x4 <-> f64x4 and f32x8  <-> u32x8
- The 'assert_*' tests for the '*pd' instructions are failing due to llvm always using the single precision ('*ps') variation
2017-09-29 11:42:27 -04:00
Dustin Bensing
e6f343d989 added support for _mm_cvtpd_epi32 / cvtpd2dq 2017-09-28 19:44:32 -04:00
gwenn
d8881bcbc9 ssse3 (#68)
* SSSE3: _mm_abs_epi16, _mm_abs_epi32, _mm_hadd_epi16

* SSSE3: _mm_hadds_epi16

* SSSE3: assert_instr

* SSSE3: _mm_hadd_epi32

* SSSE3: _mm_hsub_epi16

* SSSE3: _mm_hsubs_epi16

* SSSE3: _mm_hsub_epi32

* SSSE3: _mm_maddubs_epi16

* SSSE3: _mm_mulhrs_epi16

* SSSE3: _mm_sign_epi8

* SSSE3: _mm_sign_epi32

* SSSE3: _mm_sign_epi32

* SSSE3: Fix assert_instr
2017-09-28 14:10:40 -05:00
krampenschiesser
0511ecbaf0 added support for _mm_cvtpd_ps / cvtpd2ps 2017-09-28 12:33:05 -05:00
p32blo
3dba6f3b4d avx: add vmaxpd, vmaxps, vminpd, vminps 2017-09-28 11:03:25 -05:00
Dan Robertson
fc65913f2f [x86] Add _mm_cvtps_epi32 (cvtps2dq) function
_mm_cvtepi32_ps has been implemented, but _mm_cvtps_epi32 is missing.
Use the implementation of _mm_cvtepi32_ps as a guide for implementing
_mm_cvtps_epi32.
2017-09-28 08:41:11 -04:00
gnzlbg
7e0655e92f [arm] fix unused unsafe warning 2017-09-28 07:07:34 -04:00
gnzlbg
ffc69c752e [arm] fix aarch64 cls intrinsic 2017-09-28 06:59:53 -04:00
Alex Crichton
e0176b278f Mark arm intrinsics as unsafe 2017-09-27 21:41:51 -07:00
Alex Crichton
7063458f30 Touch up some recently added intrinsics
* Mark them as `unsafe`
* Mark the tests as `unsafe`
* Leverage the new features of the `#[assert_instr]` macro
2017-09-27 19:44:14 -07:00