Commit Graph

2551 Commits

Author SHA1 Message Date
gnzlbg
bd629147a1 Upgrade to cupid 0.0.5 and cleanup duplicated code in x86 run-time (#203)
* [ci] upgrade to cupid 0.0.5

* [runtime x86] cleanup duplicated code
2017-11-21 08:46:36 -06:00
Jonathan Goodman
f236ef8f6b Fix comments for _mm_cvtepu8_epi{32, 64} (#200) 2017-11-20 16:55:48 -06:00
Alex Crichton
738312d17c Unconditionally flag as #![no_std] (#196)
This is more idiomatic for no-std-compatible crates where imports are
unconditionally rewritten to `core` and then only when necessary `std` is pulled
in explicitly.
2017-11-19 19:53:02 +01:00
gnzlbg
0d11a78a0e refactor the x86 module (#195)
* refactor the x86 module

* document the i686 check

* document strict and intel_sde feature

* document nvptx module
2017-11-19 19:51:53 +01:00
gnzlbg
ff1b88d721 [clippy] fix missing doc on pub item 2017-11-17 17:41:23 +01:00
gnzlbg
ceef91aaba [arm] runtime-detection support 2017-11-17 17:41:23 +01:00
gnzlbg
fe7da57403 [ci] add intel_sde feature 2017-11-17 17:41:23 +01:00
gnzlbg
9e7242ecad [cpuid] Improve docs, implement __get_cpuid_max
Closes #174 .
2017-11-17 17:41:23 +01:00
gnzlbg
00cf3c05eb [x86] cleanup run-time; add SSE4a, AVX-512, and xsave 2017-11-17 17:41:23 +01:00
gnzlbg
9ac630245d [x86] implement cpuid intrinsics 2017-11-17 17:41:23 +01:00
gnzlbg
2fc4c25972 [x86] implement __read/write eflags 2017-11-17 17:41:23 +01:00
gnzlbg
05dd98c643 [x86] implement xsave intrinsics 2017-11-17 17:41:23 +01:00
gnzlbg
7c593d9857 [stdsimd-test] testing conditional on more than one feature 2017-11-17 17:41:23 +01:00
gnzlbg
2136214934 add support for no_std 2017-11-17 16:52:05 +01:00
gnzlbg
fda2ead377 add nvptx architecture 2017-11-17 16:52:05 +01:00
André Oliveira
33e26c0b4a Formatting 2017-11-17 16:42:35 +01:00
André Oliveira
093029a6c3 Make test intrinsics use __m128i 2017-11-17 16:42:35 +01:00
André Oliveira
9d8c2639c1 Change _mm_mpsadbw_epu8 to work with unsigned integers 2017-11-17 16:42:35 +01:00
André Oliveira
7b0e7c6f52 Add _mm_mpsadbw_epu8 2017-11-17 16:42:35 +01:00
André Oliveira
b85d4f799c Add _mm_minpos_epu16 2017-11-17 16:42:35 +01:00
André Oliveira
b67e9dfe5d Add _mm_test_all_zeros, _mm_test_all_ones and _mm_test_mix_ones_zeros 2017-11-17 16:42:35 +01:00
André Oliveira
93c76381b7 Add documentation for testz, testc and testnzc 2017-11-17 16:42:35 +01:00
André Oliveira
4ce80f138b Add _mm_testz_si128, _mm_testc_si128 and _mm_testnzc_si128
This should work for any 128 bit sized vector, but it only accepts i64x2 for now
2017-11-17 16:42:35 +01:00
André Oliveira
38f6087b9a Add _mm_mul_epi32 and _mm_mullo_epi32 2017-11-17 16:42:35 +01:00
André Oliveira
613cacb317 Add remaining _mm_cvtep* intrinsics 2017-11-17 16:42:35 +01:00
André Oliveira
ac11d6941d Add _mm_cvtepu8_epi{16, 32, 64} 2017-11-17 16:42:35 +01:00
André Oliveira
48027e994b Add _mm_cvtepi32_epi64 and fix typo 2017-11-17 16:42:35 +01:00
Tony Sifkarovski
60c2608cce [avx2] add _mm_256_cvtepu{8,16,32}_epi{16,32,64} (#192) 2017-11-17 09:22:18 +01:00
crypto-universe
1842e36d00 [x86][sse4.1] Add phminposuw & pmul* instructions
pmulld is implemented via multiplication.
2017-11-16 07:12:14 -05:00
gnzlbg
955fd849ff implement missing std::ops 2017-11-13 06:42:49 -05:00
gnzlbg
6ed424a848 syn API breaking change (#189) 2017-11-11 23:35:00 +01:00
crypto-universe
bdaea04f2b [x86][sse4.1] Add pmin* instructions (#186) 2017-11-08 23:05:27 -06:00
Caio
545a2a8e2a Add _mm_unpackhi_pd and _mm_unpacklo_pd (#184)
* Add _mm_unpackhi_pd and _mm_unpacklo_pd
2017-11-08 11:22:21 +01:00
gnzlbg
20324666f5 [ci] fix formatting and clippy (#182) 2017-11-07 09:00:55 -06:00
Malo Jaffré
664395e25e Fix a confusing typo in a cast name. (#179) 2017-11-06 12:45:31 -06:00
André Oliveira
a05fb1b292 Add the necessary SIMD types for sign extend intrinsics 2017-11-06 07:17:27 -05:00
André Oliveira
bab1c7b16a Avoid using simd_cast directly 2017-11-06 07:17:27 -05:00
André Oliveira
866596cd53 Add _mm_cvtepi16_epi32 and _mm_cvtepi16_epi64 (commented) 2017-11-06 07:17:27 -05:00
André Oliveira
fa240f2477 Add commented implementation of _mm_cvtepi8_epi64 2017-11-06 07:17:27 -05:00
André Oliveira
37396f3471 Add _mm_cvtepi8_epi32
- This might be wrong since the cast and the shuffle nedded to be inverted
2017-11-06 07:17:27 -05:00
André Oliveira
f9caf376b2 Add _mm_cvtepi8_epi16 2017-11-06 07:17:27 -05:00
André Oliveira
d6c990967b Add _mm_packus_epi32 and _mm_cmpeq_epi64 intrinsics 2017-11-06 07:17:27 -05:00
Adam Niederer
a6d9d0c100 Fix mm256_round_epi* return types (#173)
From the Intel intrinsics manual (emphasis mine):

> Compute the absolute value of packed 16-bit integers in a, and store the
> *unsigned* results in dst.
2017-11-05 20:56:07 -06:00
gwenn
6d4ea09a21 Avx (#172)
* avx: _mm256_load_pd, _mm256_store_pd, _mm256_load_ps, _mm256_store_ps

* avx: _mm256_load_si256, _mm256_store_si256
2017-11-05 20:55:32 -06:00
Malo Jaffré
74870635e5 Add SSE2 trivial aliases and conversions. (#165)
`_mm_cvtsd_f64`, `_mm_cvtsd_si64x` and `_mm_cvttsd_si64x`.
See #40.
2017-11-02 14:10:50 -04:00
gnzlbg
542aac988a [ci] enable clippy (#62)
* [ci] enable clippy

* [clippy] fix clippy issues
2017-11-02 13:43:12 -04:00
gwenn
96111d548e Avx (#163)
* avx: _mm256_testnzc_si256

* avx: _mm256_shuffle_ps

8 levels of macro expansion takes too long to compile.

* avx: remove useless 0 in tests

* avx: _mm256_shuffle_ps

Macro expansion can be reduced to four levels

* avx: _mm256_blend_ps

Copy/paste from avx2::_mm256_blend_epi32
2017-11-01 08:47:40 -05:00
Alex Crichton
5cb3986530 Bump to 0.0.3 2017-10-30 15:53:07 -07:00
gnzlbg
d6aefaabea [aarch64] refactor AArch64 intrinsics into its own architecture module (#162) 2017-10-29 11:37:43 -05:00
gnzlbg
7f35e50563 [runtime-detection-x86] detect avx and avx2 only if osxsave is true (#154) 2017-10-28 16:34:09 -04:00