Alex Crichton
9a440a3eb0
Fix i586 tests
2017-10-11 17:33:41 -07:00
Malo Jaffré
5a028d329e
Add _MM_TRANSPOSE4_PS pseudo-macro. ( #106 )
...
This adds a strange macro, which I've replaced with a function, because it
seems there are not many better alternatives.
Also adds a test, and `#[allow(non_snake_case)]` to `#[simd_test]`.
2017-10-11 11:28:44 -04:00
Dan Robertson
5a6005aa29
[x86] sse4.2 add docs for _SIDD_EQUAL_RANGES ( #107 )
...
- Add docs for the _SIDD_EQUAL_RANGES mode
2017-10-11 11:28:17 -04:00
Alex Crichton
9da400965f
Attempt to fix CI ( #108 )
...
Need to bring codegen units back to only one for now
2017-10-11 11:28:02 -04:00
gwenn
7c88f7c49b
Avx ( #105 )
...
* avx: _mm_permute_ps and sse: _mm_undefined_ps
* avx: _mm256_permutevar_pdi, _mm_permutevar_pd
* avx: _mm256_permute_pd
* avx: _mm256_shuffle_pd fixed
* avx: _mm_permute_pd, sse2: _mm_undefined_pd
* avx: _mm256_permute2f128_ps
* avx: _mm256_permute2f128_pd
* avx: _mm256_permute2f128_si256
* avx: _mm256_broadcast_ss
* avx: _mm_broadcast_ss
* avx: _mm256_broadcast_sd
* avx: _mm256_broadcast_ps
* avx: _mm256_broadcast_pd
* avx: _mm_cmp_pd
* avx: _mm256_cmp_pd
* avx: _mm_cmp_ps
* avx: _mm256_cmp_ps
* avx: _mm_cmp_sd
* avx: _mm_cmp_ss
* avx: _mm256_insertf128_pd, _mm256_castpd128_pd256
* avx: _mm256_insertf128_si256, _mm256_castsi128_si256
* avx: _mm256_insertf128_ps, _mm256_castps128_ps256
* avx: _mm256_insert_epi8
* avx: _mm256_insert_epi16
* avx: _mm256_insert_epi32
* avx: _mm256_insert_epi64
* Try to fix i586 build
* Fix missing inline and target_feature
* sse: fix _mm_undefined_ps
2017-10-09 16:05:36 -05:00
Thomas Schilling
807ec089b7
Implement SSE _mm_load* instructions ( #99 )
...
* Add _mm_loadh_pi
* Add doctest for _mm_loadh_pi
* Add _mm_loadl_pi
* Add _mm_load_ss
* Add _mm_load1_ps and _mm_load_ps1
* Add _mm_load_ps and _mm_loadu_ps
* Add _mm_loadr_ps
* Replace _mm_loadu_ps TODO with explanation
* Tweak expected instructions for _mm_loadl/h_pi on x86
* Try fixing i586 test crash
* Targets i586/i686 generate different code for _mm_loadh_pi
2017-10-07 21:12:47 -05:00
Thomas Schilling
a547f2bf36
Implement SSE _mm_set* intrinsics ( #100 )
...
* Add _mm_set_ss
* Add _mm_set1_ps and _mm_set_ps1
* Add _mm_set_ps
* Add _mm_setr_ps
* Add _mm_setzero_ps
* Fix _mm_setr_ps instr test on x86
* Sidestep black_box ABI issue on i586
2017-10-07 15:04:55 +00:00
Alex Crichton
7055f496c7
Add an i586 builder ( #101 )
...
The i586 targets on x86 are defined to be 32-bit and lacking in sse/sse2 unlike
the i686 target which has sse2 turned on by default. I was mostly curious what
would happen when turning on this target, and it turns out quite a few tests
failed!
Most of the tests here had to do with calling functions with ABI mismatches
where the callee wasn't `#[inline(always)]`. Various pieces have been updated
now and we should be passing all tests.
Only one instruction assertion ended up changing where the function generates a
different instruction with sse2 ambiently enabled and without it enabled.
2017-10-06 22:54:18 +00:00
Alex Crichton
40eeae6adf
Enable multiple #[assert_instr] attributes ( #96 )
...
* Enable multiple #[assert_instr] attributes
Looks like all we needed to do was generate new function names!
* Uncomment assertions for `_mm_prefetch`
2017-10-06 21:19:14 +00:00
gwenn
ee0f165e8e
Avx ( #90 )
...
* avx: _mm256_andnot_pd, _mm256_andnot_ps
* avx: _mm256_blendv_pd
* avx: _mm256_blend_pd with no assert_instr
With assert_instr: too many instructions in the disassembly
* avx: _mm256_blendv_ps
* avx: _mm256_hadd_pd
* avx: _mm256_hadd_ps
* avx: _mm256_hsub_pd
* avx: _mm256_hsub_ps
* avx: _mm256_xor_pd
* avx: _mm256_xor_ps
* avx: _mm256_cvtepi32_pd
* avx: _mm256_cvtepi32_ps
* avx: _mm256_cvtpd_ps
* avx: _mm256_cvtps_epi32
* avx: _mm256_cvtps_pd
* avx: _mm256_cvttpd_epi32
* avx: _mm256_cvtpd_epi32
* avx: replace simd_cast by proper instrunction
* avx: _mm256_cvttps_epi32
* avx: _mm256_extractf128_ps, _mm256_undefined_ps
* avx: _mm256_extractf128_pd, _mm256_undefined_pd
* avx: _mm256_extractf128_si256, _mm256_undefined_si256
* avx: _mm256_extract_epi8
* avx: _mm256_extract_epi16
* avx: _mm256_extract_epi32
* avx: _mm256_extract_epi64
* avx: _mm256_zeroall
* avx: _mm256_zeroupper
* avx: _mm256_permutevar_ps
* avx: _mm_permutevar_ps
* avx: replace simd_cast by as_*
* avx: _mm256_permute_ps
* avx: _mm256_dp_ps
* avx: _mm256_shuffle_pd
* avx: _mm256_shuffle_pd, wrong instruction generated
* implement _mm256_hadd_ps and _mm256_hadd_pd
* avx: implement _mm256_hsub_pd and _mm256_hsub_ps
* assert_instr: raise the limit up to 30 instructions
2017-10-05 13:42:29 -05:00
Dan Robertson
b421e9210c
[Docs] Add more docs to the sse4.2 cmpstr fns ( #94 )
...
- Add more examples to _mm_cmpistri
- Add basic docs to _mm_cmpestri
- Cleanup lib docs
2017-10-05 18:26:40 +02:00
Thomas Schilling
186b8fe093
Implement _mm_getcsr, _mm_setcsr, _mm_sfence ( #88 )
...
* Add _mm_sfence
* Add _mm_getcsr/_mm_setcsr and convenience wrappers
* Use test::black_box to simplify tests
* Use uppercase naming for C-macro equivalents
Discussed at https://github.com/rust-lang-nursery/stdsimd/issues/84
2017-10-05 18:17:43 +02:00
Thomas Schilling
c845a1baaf
Implement _mm_prefetch ( #78 )
...
This boils down to using LLVMs `prefetch` intrinsic [1].
[1]: https://llvm.org/docs/LangRef.html#llvm-prefetch-intrinsic
2017-10-05 18:08:58 +02:00
Adam Niederer
9695f2cfaf
Improve _mm256_round_* docs ( #93 )
...
Fix a grammatical error, use a list instead of using a code block or nothing,
and add the LLVM immediate reference.
2017-10-05 00:25:18 +02:00
pythoneer
9a6176723b
added _mm_cvttps_epi32 ( #89 )
2017-10-04 11:16:53 +02:00
Dan Robertson
c1da3bad76
[Docs] Improve documentation ( #87 )
...
- Add "How to write and example" section to CONTRIBUTING.md
- Add a basic example using `target_feature` to the main page
- Improve documentation of SSE 4.2
- Improve documentation of constants
- Improve documentation of _mm_cmpistri
2017-10-04 11:15:39 +02:00
gwenn
3202558c98
avx2: _mm256_alignr_epi8
2017-09-30 11:27:15 -04:00
gwenn
be7f29da03
Fix rustdoc
2017-09-30 11:27:15 -04:00
gwenn
d1dff51d90
ssse3: _mm_alignr_epi8
2017-09-30 11:27:15 -04:00
Dustin Bensing
fa2e02af28
added _mm_cvtsd_si32, _mm_cvtsd_ss, _mm_cvtss_sd, _mm_cvttpd_epi32, _mm_cvttsd_si32
2017-09-30 11:27:05 -04:00
Dan Robertson
7a75303aec
[x86] Implement sse4.2 crc32 functions
...
- Implement
- _mm_crc32_u8
- _mm_crc32_u16
- _mm_crc32_u32
- _mm_crc32_u64
- _mm_cmpgt_epi64
2017-09-30 09:53:34 -04:00
gwenn
b6a3bc42b3
Remove some failing assert_instr
2017-09-30 09:13:18 -04:00
gwenn
f0f5108a98
sse3: _mm_loaddup_pd and sse2: _mm_load1_pd
2017-09-30 09:13:18 -04:00
gwenn
4cbb838e2e
sse3: _mm_moveldup_ps
2017-09-30 09:13:18 -04:00
gwenn
261534cb0f
sse3: _mm_movehdup_ps
2017-09-30 09:13:18 -04:00
gwenn
8e07404403
sse3: _mm_movedup_pd
2017-09-30 09:13:18 -04:00
gwenn
e4ffcb6fdd
sse3: _mm_hsub_ps
2017-09-30 09:13:18 -04:00
gwenn
d81d0a4a67
sse3: _mm_hsub_pd
2017-09-30 09:13:18 -04:00
gwenn
7f84607f16
sse3: _mm_hadd_ps
2017-09-30 09:13:18 -04:00
gwenn
fbd3416f0c
sse3: _mm_hadd_pd
2017-09-30 09:13:18 -04:00
gwenn
dc684dc221
sse3: _mm_addsub_pd
2017-09-30 09:13:18 -04:00
gwenn
fff98467f3
sse3: _mm_addsub_ps
2017-09-30 09:13:18 -04:00
gwenn
b5a28bad22
sse3: _mm_lddqu_si128
2017-09-30 09:13:18 -04:00
Andrew Gallant
dfc7bef6cc
add note about release mode in tests
2017-09-30 08:44:28 -04:00
Dan Robertson
5adea8cc03
Implement the sse4.2 string comparison intrinsics ( #70 )
...
* Docs: Fix typo in module documentation
s/paltform/platform/g
* [x86] Implement sse4.2 string cmp intrinsics
- Implement
- _mm_cmpistrm
- _mm_cmpistri
- _mm_cmpistrz
- _mm_cmpistrc
- _mm_cmpistrs
- _mm_cmpistro
- _mm_cmpistra
- _mm_cmpestrm
- _mm_cmpestrz
- _mm_cmpestrc
- _mm_cmpestrs
- _mm_cmpestro
- _mm_cmpestra
- Add documentation to _mm_cmpestri
- Add missing constants
2017-09-30 07:35:37 +00:00
Vincent Barrielle
44d1343cb0
avx: add _mm256_div_pd, _mm256_div_ps
2017-09-29 11:53:02 -04:00
André Oliveira
d23da170d5
Match clang's code unsigned implementation for consistency
2017-09-29 11:42:27 -04:00
André Oliveira
6a081164bb
Reorder imports
2017-09-29 11:42:27 -04:00
André Oliveira
790087c0fb
Fix 'assert_*' tests by using the single precision instruction
2017-09-29 11:42:27 -04:00
André Oliveira
f2cbe79265
Remove define_from! hack and use mem::transmute directly
2017-09-29 11:42:27 -04:00
André Oliveira
9ad5c4e88a
avx: add vandpd, vandps, vorps and vorpd
...
- HACK Warning: Add from impls for u64x4 <-> f64x4 and f32x8 <-> u32x8
- The 'assert_*' tests for the '*pd' instructions are failing due to llvm always using the single precision ('*ps') variation
2017-09-29 11:42:27 -04:00
Dustin Bensing
e6f343d989
added support for _mm_cvtpd_epi32 / cvtpd2dq
2017-09-28 19:44:32 -04:00
gwenn
d8881bcbc9
ssse3 ( #68 )
...
* SSSE3: _mm_abs_epi16, _mm_abs_epi32, _mm_hadd_epi16
* SSSE3: _mm_hadds_epi16
* SSSE3: assert_instr
* SSSE3: _mm_hadd_epi32
* SSSE3: _mm_hsub_epi16
* SSSE3: _mm_hsubs_epi16
* SSSE3: _mm_hsub_epi32
* SSSE3: _mm_maddubs_epi16
* SSSE3: _mm_mulhrs_epi16
* SSSE3: _mm_sign_epi8
* SSSE3: _mm_sign_epi32
* SSSE3: _mm_sign_epi32
* SSSE3: Fix assert_instr
2017-09-28 14:10:40 -05:00
krampenschiesser
0511ecbaf0
added support for _mm_cvtpd_ps / cvtpd2ps
2017-09-28 12:33:05 -05:00
p32blo
3dba6f3b4d
avx: add vmaxpd, vmaxps, vminpd, vminps
2017-09-28 11:03:25 -05:00
Dan Robertson
fc65913f2f
[x86] Add _mm_cvtps_epi32 (cvtps2dq) function
...
_mm_cvtepi32_ps has been implemented, but _mm_cvtps_epi32 is missing.
Use the implementation of _mm_cvtepi32_ps as a guide for implementing
_mm_cvtps_epi32.
2017-09-28 08:41:11 -04:00
gnzlbg
7e0655e92f
[arm] fix unused unsafe warning
2017-09-28 07:07:34 -04:00
gnzlbg
ffc69c752e
[arm] fix aarch64 cls intrinsic
2017-09-28 06:59:53 -04:00
Alex Crichton
e0176b278f
Mark arm intrinsics as unsafe
2017-09-27 21:41:51 -07:00
Alex Crichton
7063458f30
Touch up some recently added intrinsics
...
* Mark them as `unsafe`
* Mark the tests as `unsafe`
* Leverage the new features of the `#[assert_instr]` macro
2017-09-27 19:44:14 -07:00