Commit Graph

2551 Commits

Author SHA1 Message Date
Alex Crichton
912ad06b3b Update CONTRIBUTING.md with recent changes 2018-01-29 07:17:14 -08:00
Alex Crichton
5b445c5cac Update doc generation with recent devlopments 2018-01-28 22:00:13 -08:00
Alex Crichton
0f5b382dd6 Enable verification of more intrinsics (#309)
Looks like intrinsics that weren't listing a target feature were accidentally
omitted from the verification logic, so this commit fixes that!

Along the way I've ended up filing #307 and #308 for detected inconsistencies.
2018-01-28 23:59:44 -06:00
Alex Crichton
82acb0c953 Move from #[inline(always)] to #[inline] (#306)
* Move from #[inline(always)] to #[inline]

This commit blanket changes all `#[inline(always)]` annotations to `#[inline]`.
Fear not though, this should not be a regression! To clarify, though, this
change is done out of correctness to ensure that we don't hit stray LLVM errors.

Most of the LLVM intrinsics and various LLVM functions we actually lower down to
only work correctly if they are invoked from a function with an appropriate
target feature set. For example if we were to out-of-the-blue invoke an AVX
intrinsic then we get a [codegen error][avx-error]. This error comes about
because the surrounding function isn't enabling the AVX feature. Now in general
we don't have a lot of control over how this crate is consumed by downstream
crates. It'd be a pretty bad mistake if all mistakes showed up as scary
un-debuggable codegen errors in LLVM!

On the other side of this issue *we* as the invokers of these intrinsics are
"doing the right thing". All our functions in this crate are tagged
appropriately with target features to be codegen'd correctly. Indeed we have
plenty of tests asserting that we can codegen everything across multiple
platforms!

The error comes about here because of precisely the `#[inline(always)]`
attribute. Typically LLVM *won't* inline functions across target feature sets.
For example if you have a normal function which calls a function that enables
AVX2, then the target, no matter how small, won't be inlined into the caller.
This is done for correctness (register preserving and all that) but is also how
these codegen errors are prevented in practice.

Now we as stdsimd, however, are currently tagging all functions with "always
inline this, no matter what". That ends up, apparently, bypassing the logic of
"is this even possible to inline". In turn we start inlining things like AVX
intrinsics into functions that can't actually call AVX intrinsics, creating
codegen errors at compile time.

So with all that motivation, this commit switches to the normal inline hints for
these functions, just `#[inline]`, instead of `#[inline(always)]`. Now for the
stdsimd crate it is absolutely critical that all functions are inlined to have
good performance. Using `#[inline]`, however, shouldn't hamper that!

The compiler will recognize the `#[inline]` attribute and make sure that each of
these functions is *candidate* to being inlined into any and all downstream
codegen units. (aka if we were missing `#[inline]` then LLVM wouldn't even know
the definition to inline most of the time). After that, though, we're relying on
LLVM to naturally inline these functions as opposed to forcing it to do so.
Typically, however, these intrinsics are one-liners and are trivially
inlineable, so I'd imagine that LLVM will go ahead and inline everything all
over the place.

All in all this change is brought about by #253 which noticed various codegen
errors. I originally thought it was due to ABI issues but turned out to be
wrong! (although that was also a bug which has since been resolved). In any case
after this change I was able to get the example in #253 to execute in both
release and debug mode.

Closes #253

[avx-error]: https://play.rust-lang.org/?gist=50cb08f1e2242e22109a6d69318bd112&version=nightly

* Add inline(always) on eflags intrinsics

Their ABI actually relies on it!

* Leave #[inline(always)] on portable types

They're causing test failures on ARM, let's investigate later.
2018-01-28 23:40:39 -06:00
Alex Crichton
a2403de290 Don't count nop instructions after functions
Looks like disassemblers will fill this in and/or LLVM inserts them for
alignment, not useful to us in calculations.
2018-01-28 20:35:37 -08:00
Alex Crichton
263937a499 Fix a lint warning on aarch64 2018-01-28 20:32:11 -08:00
Alex Crichton
11f7b7e38e Verify intrinsics don't leak to i686 (#305)
Some intrinsics take `i64` or `u64` arguments which typically means that they're
using 64-bit registers and aren't actually available on x86. This commit adds a
check to stdsimd-verify to assert this and moves around some intrinsics that I
believe should only be available on x86_64.

This commit was checked in many places against gcc/clang/MSVC using godbolt.org
to ensure that we're agreeing with what other compilers are doing.

Closes #304
2018-01-28 22:31:31 -06:00
Alex Crichton
aefb22c51e Don't distinguish between i586/i686 (#301)
This was historically done as the contents of the `i686` module wouldn't
actually compile on i586 for various reasons. I believe I've tracked this down
to #300 where LLVM refuses to compile a function using the `x86_mmx` type
without actually enabling the `mmx` feature (sort of reasonably so!). This
commit will now compile in both the `i586` and `i686` modules of this crate into
the `i586-unknown-linux-gnu` target, and the relevant functions now also enable
the `mmx` feature if they're using the `__m64` type.

I believe this is uncovering a more widespread problem where the `__m64` isn't
usable outside the context of `mmx`-enabled functions. The i686 and x86_64
targets have this feature enabled by default which is why it's worked there, but
they're not enabled for the i586 target. We'll probably want to consider this
when stabilizing!
2018-01-28 22:31:22 -06:00
Alex Crichton
e0aed0fffc Fix tests for a future nightly (#297)
In rust-lang/rust#47743 the SIMD types in the Rust ABI are being switched to
getting passed through memory unconditionally rather than through LLVM's
immediate registers. This means that a bunch of our assert_instr invocations
started breaking as LLVM has more efficient methods of dealing with memory than
the instructions themselves.

This switches `assert_instr` to unconditionally use a shim that is an `extern`
function which should revert back to the previous behavior, using the simd types
as immediates and testing the same.
2018-01-25 12:13:48 -06:00
Alex Crichton
73794e5035 Update stdsimd-verify to print out instruction differences
Too many to deal with for now...
2018-01-20 10:13:26 -08:00
Alex Crichton
30694efc68 Remove PartialEq impls for x86 types (#294)
* Remove `PartialEq for __m64`

This helps to strip the public API of the vendor type for now, although this may
come back at a later date!

* Remove `PartialEq for __m128i`

Like the previous commit, but for another type!

* Remove `PartialEq for __m256i`

Same as previous commit!
2018-01-20 11:24:09 -06:00
Alex Crichton
4b66abaede Move x86-specific types to the vendor module (#293)
I believe we're reserving the `simd` module for exclusively the portable types
and their operations, so this commit moves the various x86-specific types from
the portable modules to the `x86` module. Along the way this also adds some doc
blocks for all the existing x86 types.
2018-01-19 21:20:44 -06:00
Alex Crichton
e19b6d9efd Remove Into/From between x86 and portable types (#292)
This is primarily doing to avoid falling into a portability trap by accident,
and in general makes the vendor types (on x86) going towards as minimal as they
can be. Along the way some tests were cleaned up which were still using the
portable types.
2018-01-19 20:15:07 -06:00
Alex Crichton
54452230a7 Add an example of SIMD-powered hex encoding (#291)
This is lifted from an example elsewhere I found and shows off runtime
dispatching along with a lot of intrinsics being used in a bunch.
2018-01-19 16:53:38 -06:00
Alex Crichton
faf5aea427 Reduce implicit reliance on structure of __m* types (#290)
They need to be structured *somehow* to be the right bit width but ideally we
wouldn't have the intrinsics rely on the particulars about how they're
represented.
2018-01-19 14:44:31 -06:00
Alex Crichton
330a124568 Update stdsimd-verify for vendor types (#289)
This commit provides insurance that intrinsics are only introduced with known
canonical types (`__m128i` and such) instead of also allowing `u8x16` for
example.
2018-01-19 12:11:21 -06:00
Alex Crichton
30b1145ef7 Migrate the i586::avx2 module to vendor types (#287) 2018-01-19 10:32:16 -06:00
Alex Crichton
1ad6d5fa88 Migrate the x86_64 folder to vendor types (#284) 2018-01-19 10:30:25 -06:00
messense
8deae9ce66 Update links in Cargo.toml to rust-lang-nursery/stdsimd (#288) 2018-01-18 20:23:50 -06:00
Alex Crichton
c5afde07d2 Migrate the i586::avx module to vendor types (#286)
Closes #285
2018-01-18 11:21:03 -06:00
Alex Crichton
5c8867c7c3 Update target_feature syntax (#283)
This commit updates to the latest nightly's syntax where `#[target_feature =
"+foo"]` is now deprecated in favor of `#[target_feature(enable = "foo")]`.
Additionally `#[target_feature]` can only be applied to `unsafe` functions for
now.

Along the way this removes a few exampels that were just left around and also
disables the `fxsr` modules as that target feature will need to land in upstream
rust-lang/rust first as it's currently unknown to the compiler.
2018-01-17 09:45:02 -06:00
Josef Ippisch
8deead27f2 Implement addition aliases (#281)
- `_m_paddb` for `_mm_add_pi8`
- `_m_paddw` for `_mm_add_pi16`
- `_m_paddd` for `_mm_add_pi32`
- `_m_paddsb` for `_mm_adds_pi8`
- `_m_paddsw` for `_mm_adds_pi16`
- `_m_paddusb` for `_mm_adds_pu8`
- `_m_paddusw` for `_mm_adds_pu16`
2018-01-13 12:08:53 -06:00
Josef Ippisch
50cf00372d MMX subtraction instructions (#280)
* Implement `_m_psubb`

* Implement `_m_psubw`

* Implement `_m_psubd`

* Implement `_m_psubsb`

* Implement `_m_psubsw`

* Implement `_m_psubusb`

* Implement `_m_psubusw`

* Have the subtraction intrinsic naming consistent with the addition ones

E.g. use `_mm_sub_pi8` instead of `_m_psubb`

* Implement all subtraction aliases for the `_mm_*` variants

- `_m_psubb` for `_mm_sub_pi8`
- `_m_psubw` for `_mm_sub_pi16`
- `_m_psubd` for `_mm_sub_pi32`
- `_m_psubsb` for `_mm_subs_pi8`
- `_m_psubsw` for `_mm_subs_pi16`
- `_m_psubusb` for `_mm_subs_pu8`
- `_m_psubusw` for `_mm_subs_pu16`
2018-01-12 17:10:51 -06:00
Alex Crichton
e77ebf194a Migrate the i686 module to vendor types (#279)
* Migrate `i686::sse` to vendor types

* Migrate `i686::sse2` to vendor types

* Migrate i686::sse41 to vendor types

* Migrate i686::sse42 to vendor types
2018-01-12 14:08:20 -06:00
Alex Crichton
48a7490711 Make rustc's job a little esaier in sse42 (#277)
Move all the casts from `__m128i` to `i8x16` outside the macro invocations so
rustc only has to resolve a few function calls, not thousands!
2018-01-12 11:37:06 -06:00
Alex Crichton
feb8c2b152 Migrate i586::ssse3 to vendor types (#275) 2018-01-11 23:18:35 -06:00
Alex Crichton
fde52cb334 Migrate i586::sse41 to vendor types (#276) 2018-01-11 23:18:15 -06:00
Alex Crichton
3148881fa2 Move travis workaround earlier
Try to get it used on OSX as well
2018-01-11 08:24:11 -08:00
Alex Crichton
5467c0a008 Migrate i586::sse3 to vendor types (#274) 2018-01-11 10:13:26 -06:00
Alex Crichton
6d8d2f81e9 Migrate a bunch of i586::sse2 to native types (#273) 2018-01-10 12:42:26 -06:00
Alex Crichton
baf9d0e7e0 Migrate the i686::sse module to vendor types (#269)
This migrates the entire `i686::sse` module (and touches a few others) to the
vendor types.
2018-01-09 13:38:09 -06:00
Jef
248f5441bb Make splat a const fn 2018-01-09 18:38:47 +01:00
Alex Crichton
fd2cc3bc05 Migrate _mm_add_ss to __m128 (#265)
This commit starts the migration towards Intel's types one intrinsic at a time,
starting with `_mm_add_ss`. This is mostly just to get a feel for what the tests
will start to look like.
2018-01-09 09:49:08 -06:00
gnzlbg
58664a6f54 More run-time detection improvements (#242)
* [core/runtime] use getauxval on non-x86 platforms

* test coresimd::auxv against auxv crate

* add test files from auxv crate

* [arm] use simd_test macro

* formatting

* missing docs

* improve docs

* reading /proc/self/auxv succeeds only if reading all fields succeeds

* remove cc-crate build dependency

* getauxval succeeds only if hwcap/hwcap2 are non-zero

* fix formatting

* move getauxval to stdsimd

* delete getauxval-wrapper.c

* remove auxv crate dev-dependency from coresimd
2018-01-09 09:23:45 -06:00
Alex Crichton
94fe929a03 Update to a released syn/quote version 2018-01-08 10:10:52 -08:00
Josef Ippisch
705c34b4eb Implement all addition MMX intrinsics (#266)
* Implement `_mm_add_pi16`

* Implement `_mm_add_pi8`

* Implement `_mm_add_pi32`

* Implement `_mm_adds_pi16`

* Implement `_mm_adds_pi8`

* Implement `_mm_adds_pu8`

* Implement `_mm_adds_pu16`
2018-01-06 12:36:05 -06:00
Jake Goulding
4667c63113 Add RDTSC and RDTSCP intrinsics (#264) 2018-01-05 13:30:26 -06:00
gnzlbg
4bb1ea5a05 Completes SSE and adds some MMX intrinsics (#247)
* Completes SSE and adds some MMX intrinsics

MMX:

- `_mm_cmpgt_pi{8,16,32}`
- `_mm_unpack{hi,lo}_pi{8,16,32}`

SSE (is now complete):

- `_mm_cvtp{i,u}{8,16}_ps`
- add test for `_m_pmulhuw`

* fmt and clippy

* add an exception for intrinsics using cvtpi2ps
2018-01-04 10:15:23 -06:00
Alex Crichton
4f1f2bd550 Add an exception for vzeroall/vzeroupper on Windows
These apparently blow the 20 intstruction limit with all the loads/stores.
2018-01-03 16:02:35 -08:00
Alex Crichton
3441968ffa Turn down debug level on release mode
Apparently helps fix errors about codeview registers on MSVC!
2018-01-03 15:59:31 -08:00
Alex Crichton
edbfae36c0 Lower the instruction limit to 20 (#262)
Right now it's 30 which is a bit high, most of the intrinsics requiring all
these instructions ended up needing to be fixed anyway.
2018-01-03 17:21:01 -06:00
Alex Crichton
07ebce51b8 Assert intrinsic implementations are inlined properly (#261)
* assert_instr check for failed inlining

* Fix `call` instructions showing up in some intrinsics

The ABI of types like `u8x8` as they're defined isn't actually the underlying
type we need for LLVM, but only `__m64` currently satisfies that. Apparently
this (and the casts involved) caused some extraneous instructions for a number
of intrinsics. They've all moved over to the `__m64` type now to ensure that
they're what the underlying interface is.

* Allow PIC-relative `call` instructions on x86

These should be harmless when evaluating whether we failed inlining
2018-01-03 16:37:45 -06:00
gwenn
acc8d3de10 Use llvm builtins where possible (#260)
* Fix sse::_mm_cvtsi32_ss and sse::_mm_cvtsi64_ss

By using LLVM builtins, the expected instruction
is correctly generated on all platforms.

* Use LLVM builtins for storeu*

Just to make sure that the wrong instructions is not related to
Rust code.
2018-01-03 15:18:34 -06:00
gwenn
983b72d189 Last missing avx and avx2 intrinsics (#258)
* avx: _mm256_cvtss_f32, avx2: _mm256_cvtsd_f64, _mm256_cvtsi256_si32

* avx2: _mm256_slli_si256, _mm256_srli_si256

And aliases:
_mm256_bslli_epi128
_mm256_bsrli_epi128
2018-01-02 14:33:02 -06:00
Alex Crichton
ec373ba107 Update to syn master 2018-01-02 12:32:27 -08:00
Alex Crichton
59ed27cc95 Fix stdsimd-verify for syn master 2017-12-31 09:52:16 -08:00
Alex Crichton
3403b6f06a Fix compile with syn master 2017-12-31 09:19:44 -08:00
gwenn
802a379a4a sse2: remove duplicates and move intrinsics to x86_64 file (#256)
* sse2: remove duplicates from i686 file

_mm_cvtsi64x_si128
_mm_cvtsi64_si128
_mm_cvtsi128_si64
_mm_cvtsi128_si64x

* sse2: move _mm_cvtsi64_sd and _mm_cvtsi64x_sd to x86_64 file
2017-12-31 00:58:14 -06:00
Adam Niederer
9141a063c9 Add bswap (#257) 2017-12-31 00:57:04 -06:00
gwenn
5ca8c0aa93 sse: _mm_cvtpi16_ps, _mm_cvtpu16_ps, _mm_cvtpi8_ps, _mm_cvtpu8_ps (#255)
* sse: _mm_cvtpi16_ps, _mm_cvtpu16_ps, _mm_cvtpi8_ps, _mm_cvtpu8_ps

And mmx:
_mm_cmpgt_pi8
_mm_cmpgt_pi16
_mm_unpackhi_pi16
_mm_unpacklo_pi8
_mm_unpacklo_pi16

* Fix: literal out of range
2017-12-30 11:19:44 -06:00