Commit Graph

2485 Commits

Author SHA1 Message Date
Alex Crichton
30b1145ef7 Migrate the i586::avx2 module to vendor types (#287) 2018-01-19 10:32:16 -06:00
Alex Crichton
1ad6d5fa88 Migrate the x86_64 folder to vendor types (#284) 2018-01-19 10:30:25 -06:00
messense
8deae9ce66 Update links in Cargo.toml to rust-lang-nursery/stdsimd (#288) 2018-01-18 20:23:50 -06:00
Alex Crichton
c5afde07d2 Migrate the i586::avx module to vendor types (#286)
Closes #285
2018-01-18 11:21:03 -06:00
Alex Crichton
5c8867c7c3 Update target_feature syntax (#283)
This commit updates to the latest nightly's syntax where `#[target_feature =
"+foo"]` is now deprecated in favor of `#[target_feature(enable = "foo")]`.
Additionally `#[target_feature]` can only be applied to `unsafe` functions for
now.

Along the way this removes a few exampels that were just left around and also
disables the `fxsr` modules as that target feature will need to land in upstream
rust-lang/rust first as it's currently unknown to the compiler.
2018-01-17 09:45:02 -06:00
Josef Ippisch
8deead27f2 Implement addition aliases (#281)
- `_m_paddb` for `_mm_add_pi8`
- `_m_paddw` for `_mm_add_pi16`
- `_m_paddd` for `_mm_add_pi32`
- `_m_paddsb` for `_mm_adds_pi8`
- `_m_paddsw` for `_mm_adds_pi16`
- `_m_paddusb` for `_mm_adds_pu8`
- `_m_paddusw` for `_mm_adds_pu16`
2018-01-13 12:08:53 -06:00
Josef Ippisch
50cf00372d MMX subtraction instructions (#280)
* Implement `_m_psubb`

* Implement `_m_psubw`

* Implement `_m_psubd`

* Implement `_m_psubsb`

* Implement `_m_psubsw`

* Implement `_m_psubusb`

* Implement `_m_psubusw`

* Have the subtraction intrinsic naming consistent with the addition ones

E.g. use `_mm_sub_pi8` instead of `_m_psubb`

* Implement all subtraction aliases for the `_mm_*` variants

- `_m_psubb` for `_mm_sub_pi8`
- `_m_psubw` for `_mm_sub_pi16`
- `_m_psubd` for `_mm_sub_pi32`
- `_m_psubsb` for `_mm_subs_pi8`
- `_m_psubsw` for `_mm_subs_pi16`
- `_m_psubusb` for `_mm_subs_pu8`
- `_m_psubusw` for `_mm_subs_pu16`
2018-01-12 17:10:51 -06:00
Alex Crichton
e77ebf194a Migrate the i686 module to vendor types (#279)
* Migrate `i686::sse` to vendor types

* Migrate `i686::sse2` to vendor types

* Migrate i686::sse41 to vendor types

* Migrate i686::sse42 to vendor types
2018-01-12 14:08:20 -06:00
Alex Crichton
48a7490711 Make rustc's job a little esaier in sse42 (#277)
Move all the casts from `__m128i` to `i8x16` outside the macro invocations so
rustc only has to resolve a few function calls, not thousands!
2018-01-12 11:37:06 -06:00
Alex Crichton
feb8c2b152 Migrate i586::ssse3 to vendor types (#275) 2018-01-11 23:18:35 -06:00
Alex Crichton
fde52cb334 Migrate i586::sse41 to vendor types (#276) 2018-01-11 23:18:15 -06:00
Alex Crichton
3148881fa2 Move travis workaround earlier
Try to get it used on OSX as well
2018-01-11 08:24:11 -08:00
Alex Crichton
5467c0a008 Migrate i586::sse3 to vendor types (#274) 2018-01-11 10:13:26 -06:00
Alex Crichton
6d8d2f81e9 Migrate a bunch of i586::sse2 to native types (#273) 2018-01-10 12:42:26 -06:00
Alex Crichton
baf9d0e7e0 Migrate the i686::sse module to vendor types (#269)
This migrates the entire `i686::sse` module (and touches a few others) to the
vendor types.
2018-01-09 13:38:09 -06:00
Jef
248f5441bb Make splat a const fn 2018-01-09 18:38:47 +01:00
Alex Crichton
fd2cc3bc05 Migrate _mm_add_ss to __m128 (#265)
This commit starts the migration towards Intel's types one intrinsic at a time,
starting with `_mm_add_ss`. This is mostly just to get a feel for what the tests
will start to look like.
2018-01-09 09:49:08 -06:00
gnzlbg
58664a6f54 More run-time detection improvements (#242)
* [core/runtime] use getauxval on non-x86 platforms

* test coresimd::auxv against auxv crate

* add test files from auxv crate

* [arm] use simd_test macro

* formatting

* missing docs

* improve docs

* reading /proc/self/auxv succeeds only if reading all fields succeeds

* remove cc-crate build dependency

* getauxval succeeds only if hwcap/hwcap2 are non-zero

* fix formatting

* move getauxval to stdsimd

* delete getauxval-wrapper.c

* remove auxv crate dev-dependency from coresimd
2018-01-09 09:23:45 -06:00
Alex Crichton
94fe929a03 Update to a released syn/quote version 2018-01-08 10:10:52 -08:00
Josef Ippisch
705c34b4eb Implement all addition MMX intrinsics (#266)
* Implement `_mm_add_pi16`

* Implement `_mm_add_pi8`

* Implement `_mm_add_pi32`

* Implement `_mm_adds_pi16`

* Implement `_mm_adds_pi8`

* Implement `_mm_adds_pu8`

* Implement `_mm_adds_pu16`
2018-01-06 12:36:05 -06:00
Jake Goulding
4667c63113 Add RDTSC and RDTSCP intrinsics (#264) 2018-01-05 13:30:26 -06:00
gnzlbg
4bb1ea5a05 Completes SSE and adds some MMX intrinsics (#247)
* Completes SSE and adds some MMX intrinsics

MMX:

- `_mm_cmpgt_pi{8,16,32}`
- `_mm_unpack{hi,lo}_pi{8,16,32}`

SSE (is now complete):

- `_mm_cvtp{i,u}{8,16}_ps`
- add test for `_m_pmulhuw`

* fmt and clippy

* add an exception for intrinsics using cvtpi2ps
2018-01-04 10:15:23 -06:00
Alex Crichton
4f1f2bd550 Add an exception for vzeroall/vzeroupper on Windows
These apparently blow the 20 intstruction limit with all the loads/stores.
2018-01-03 16:02:35 -08:00
Alex Crichton
3441968ffa Turn down debug level on release mode
Apparently helps fix errors about codeview registers on MSVC!
2018-01-03 15:59:31 -08:00
Alex Crichton
edbfae36c0 Lower the instruction limit to 20 (#262)
Right now it's 30 which is a bit high, most of the intrinsics requiring all
these instructions ended up needing to be fixed anyway.
2018-01-03 17:21:01 -06:00
Alex Crichton
07ebce51b8 Assert intrinsic implementations are inlined properly (#261)
* assert_instr check for failed inlining

* Fix `call` instructions showing up in some intrinsics

The ABI of types like `u8x8` as they're defined isn't actually the underlying
type we need for LLVM, but only `__m64` currently satisfies that. Apparently
this (and the casts involved) caused some extraneous instructions for a number
of intrinsics. They've all moved over to the `__m64` type now to ensure that
they're what the underlying interface is.

* Allow PIC-relative `call` instructions on x86

These should be harmless when evaluating whether we failed inlining
2018-01-03 16:37:45 -06:00
gwenn
acc8d3de10 Use llvm builtins where possible (#260)
* Fix sse::_mm_cvtsi32_ss and sse::_mm_cvtsi64_ss

By using LLVM builtins, the expected instruction
is correctly generated on all platforms.

* Use LLVM builtins for storeu*

Just to make sure that the wrong instructions is not related to
Rust code.
2018-01-03 15:18:34 -06:00
gwenn
983b72d189 Last missing avx and avx2 intrinsics (#258)
* avx: _mm256_cvtss_f32, avx2: _mm256_cvtsd_f64, _mm256_cvtsi256_si32

* avx2: _mm256_slli_si256, _mm256_srli_si256

And aliases:
_mm256_bslli_epi128
_mm256_bsrli_epi128
2018-01-02 14:33:02 -06:00
Alex Crichton
ec373ba107 Update to syn master 2018-01-02 12:32:27 -08:00
Alex Crichton
59ed27cc95 Fix stdsimd-verify for syn master 2017-12-31 09:52:16 -08:00
Alex Crichton
3403b6f06a Fix compile with syn master 2017-12-31 09:19:44 -08:00
gwenn
802a379a4a sse2: remove duplicates and move intrinsics to x86_64 file (#256)
* sse2: remove duplicates from i686 file

_mm_cvtsi64x_si128
_mm_cvtsi64_si128
_mm_cvtsi128_si64
_mm_cvtsi128_si64x

* sse2: move _mm_cvtsi64_sd and _mm_cvtsi64x_sd to x86_64 file
2017-12-31 00:58:14 -06:00
Adam Niederer
9141a063c9 Add bswap (#257) 2017-12-31 00:57:04 -06:00
gwenn
5ca8c0aa93 sse: _mm_cvtpi16_ps, _mm_cvtpu16_ps, _mm_cvtpi8_ps, _mm_cvtpu8_ps (#255)
* sse: _mm_cvtpi16_ps, _mm_cvtpu16_ps, _mm_cvtpi8_ps, _mm_cvtpu8_ps

And mmx:
_mm_cmpgt_pi8
_mm_cmpgt_pi16
_mm_unpackhi_pi16
_mm_unpacklo_pi8
_mm_unpacklo_pi16

* Fix: literal out of range
2017-12-30 11:19:44 -06:00
gwenn
17edf649af Fix some assert_instr (#254)
* Fix some assert_instr

Missing assert_instr:
- _mm_cvtsi32_si128
- _mm_cvtsi128_si32
- _mm_loadl_epi64
- _mm_storel_epi64
- _mm_move_epi64
- _mm_cvtsd_f64
- _mm_setzero_pd
- _mm_load1_pd
- _mm_load_pd1
- _mm_loaddup_pd

Wrong intrusction used:
- _mm_hsub_pi16

* Try to fix CI build by disabling some asserts

* Exclude some assert_instr on (x86_64, linux)
2017-12-30 11:19:00 -06:00
Alex Crichton
be461b1377 Verify Intel intrinsics against upstream definitions (#251)
This commit adds a new crate for testing that the intrinsics listed in this
crate do indeed match the upstream definition of each intrinsic. A
pre-downloaded XML description of all Intel intrinsics is checked in which is
then parsed in the `stdsimd-verify` crate to verify that everything we write
down is matched against the upstream definitions.

Currently the checks are pretty loose to get this compiling but a few intrinsics
were fixed as a result of this. For example:

* `_mm256_extract_epi8` - AVX2 intrinsic erroneously listed under AVX
* `_mm256_extract_epi16` - AVX2 intrinsic erroneously listed under AVX
* `_mm256_extract_epi32` - AVX2 intrinsic erroneously listed under AVX
* `_mm256_extract_epi64` - AVX2 intrinsic erroneously listed under AVX
* `_mm_tzcnt_32` - erroneously had `u32` in the name
* `_mm_tzcnt_64` - erroneously had `u64` in the name
* `_mm_cvtsi64_si128` - erroneously available on 32-bit platforms
* `_mm_cvtsi64x_si128` - erroneously available on 32-bit platforms
* `_mm_cvtsi128_si64` - erroneously available on 32-bit platforms
* `_mm_cvtsi128_si64x` - erroneously available on 32-bit platforms
* `_mm_extract_epi64` - erroneously available on 32-bit platforms
* `_mm_insert_epi64` - erroneously available on 32-bit platforms
* `_mm256_extract_epi16` - erroneously returned i32 instead of i16
* `_mm256_extract_epi8` - erroneously returned i32 instead of i8
* `_mm_shuffle_ps` - the mask argument was erroneously i32 instead of u32
* `_popcnt32` - the signededness of the argument and return were flipped
* `_popcnt64` - the signededness of the argument was flipped and the argument
  was too large bit-wise
* `_mm_tzcnt_32` - the return value's sign was flipped
* `_mm_tzcnt_64` - the return value's sign was flipped
* A good number of intrinsics used `imm8: i8` or `imm8: u8` instead of `imm8:
  i32` which Intel was using. (we were also internally inconsistent)
* A number of intrinsics working with `__m64` were instead working with i64/u64,
  so they're now corrected to operate with the vector types instead.

Currently the verifications performed are:

* Each name in Rust is defined in the XML document
* The arguments/return values all agree.
* The CPUID features listed in the XML document are all enabled in Rust as well.

The type matching right now is pretty loose and has a lot of questionable
changes. Future commits will touch these up to be more strict and require closer
adherence with Intel's own types. Otherwise types like `i32x8` (or any integers
with 256 bits) all match up to `__m256i` right now, althoguh this may want to
change in the future.

Finally we're also not testing the instruction listed in the XML right now.
There's a huge number of discrepancies between the instruction listed in the XML
and the instruction listed in `assert_instr`, and those'll need to be taken care
of in a future commit.

Closes #240
2017-12-29 11:52:27 -06:00
gwenn
44a168a0b8 sse2: implements last remaining intrinsics (#244)
* sse2: __m64 related intrinsics

_mm_add_si64
_mm_mul_su32
_mm_sub_si64
_mm_cvtpi32_pd
_mm_set_epi64
_mm_set1_epi64
_mm_setr_epi64

* sse2: _mm_load_sd, _mm_loadh_pd, _mm_loadl_pd

* sse2: _mm_store_sd, _mm_storeh_pd, _mm_storel_pd

* sse2: _mm_shuffle_pd, _mm_move_sd

* sse2: _mm_cast*

_mm_castpd_ps
_mm_castpd_si128
_mm_castps_pd
_mm_castps_si128
_mm_castsi128_pd
_mm_castsi128_ps

* sse2: add some tests

* Try to fix AppVeyor build

* sse2: add more tests

* sse2: fix assert_instr for _mm_shuffle_pd

* Try to fix Travis build

* sse2: try to fix AppVeyor build

* sse2: try to fix AppVeyor build
2017-12-28 10:22:08 -06:00
Jonathan Goodman
3857c3e88a fix sse4a _mm_stream_{ss, sd} tests and docs 2017-12-27 22:32:49 +01:00
Alex Crichton
9aa4e30859 Update to syn master 2017-12-27 07:56:38 -08:00
gnzlbg
42ec76a3ff [sse4a] implement non-immediate-mode intrinsics (#249) 2017-12-22 10:14:41 -06:00
gnzlbg
1db6841813 [fmt] --force rustfmt-nightly 2017-12-22 00:24:23 +01:00
gnzlbg
52cc1abe2c [fmt] remove fn_call_width option (was removed upstream) 2017-12-22 00:24:23 +01:00
gnzlbg
5850282a1c use repr(align) to ensure proper alignment in tests 2017-12-22 00:24:23 +01:00
gnzlbg
4fb9420acb Fix rustfmt (#239)
* [fmt] manually fix some formatting
* [fmt] reformat with rustfmt-nightly
* [clippy] fix clippy issues
2017-12-14 19:57:53 +01:00
gnzlbg
5ce0c13009 [ci] powerpc/powerpc64/powerpc64le (#237)
* [ci] add powerpc/powerpc64 build bots

* unbreak stdsimd builds for targets without run-time
2017-12-14 10:44:20 -06:00
Tony Sifkarovski
645008ef32 Add unchecked methods, fix _mm_extract_epi* return types (#223)
* Adds extract_unchecked + replace_unchecked + len (#222 )

* [x86] Fixes the return types + uses extract_unchecked for:
  * _mm_extract_epi8
  * _mm_extract_epi16
  * _mm256_extract_epi8
  * _mm256_extract_epi16

* Minor changes to the other extract_epi* intrinsics for style consistency
These should now zero-extend the extracted int and behave appropriately. An old typo makes these a bit confusing, See this llvm issue.
2017-12-13 19:17:33 +01:00
gnzlbg
6e678ee678 fix clippy warnings 2017-12-13 10:19:09 -05:00
gnzlbg
84e2c7f8e4 fix __m64 imports 2017-12-13 10:19:09 -05:00
gnzlbg
9a81140e00 use i64s for the repr of __m{128,256}i and update casts 2017-12-13 10:19:09 -05:00
gnzlbg
1b987bd270 remove unnecessary mem::uninitialized 2017-12-13 10:19:09 -05:00