rust-lang/rust - rust - Gitea: Git with a cup of tea

Author	SHA1	Message	Date
Steven Fackler	5fb563aabc	Add _mm256_shuffle_epi8 and _mm256_permutevar8x32_epi32 (#133 ) * Add _mm256_shuffle_epi8 * Add _mm256_permutevar8x32_epi32	2017-10-21 14:59:37 -05:00
pythoneer	d5fd2b09a7	sse2 (#131 ) * added missing doc _mm_cvtps_pd added missing doc & test _mm_load_pd added missing doc & test _mm_store_pd added _mm_store1_pd added _mm_store_pd1 added _mm_storer_pd added _mm_load_pd1 added _mm_loadr_pd added _mm_loadu_pd * correct alignments	2017-10-21 10:46:55 -05:00
jneem	3ec870078a	avx2: _mm256_blend_epi32 and _mm256_blend_epi16. (#130 )	2017-10-18 17:29:23 -05:00
gnzlbg	a3a703d83e	[example] nbody (#117 )	2017-10-18 17:19:19 -05:00
Dan Robertson	4782ffadee	[x86] Implement avx2 broadcast intrinsics (#97 ) Implement - _mm_broadcastb_epi8 - _mm256_broadcastb_epi8 - _mm_broadcastd_epi32 - _mm256_broadcastd_epi32 - _mm_bradcastq_epi64 - _mm256_broadcastq_epi64 - _mm_broadcastsd_pd - _mm256_broadcastsd_pd - _mm256_broadcastsi128_si256 - _mm_broadcastss_ps - _mm256_broadcastss_ps - _mm_broadcastw_epi16 - _mm256_broadcast2_epi16	2017-10-18 14:36:17 -05:00
Alex Crichton	7b249298c0	Uncomment _mm256_mpsadbw_epu8 (#128 ) Just needed some `constify_imm8!` treatment Closes #59	2017-10-18 13:17:09 -05:00
gnzlbg	2dc965b69a	[neon] reciprocal square-root estimate (#121 )	2017-10-18 13:16:34 -05:00
Alex Crichton	13bc6b8517	Add CI in Intel's instruction emulator (#113 ) This commit adds a new builder on CI for running tests in Intel's own emulator and also adds an assertion that on this emulator no tests are skipped due to missing CPU features by accident. Closes #92	2017-10-18 11:35:11 -04:00
André Oliveira	02c89b24ba	sse4.1 instructions (#98 ) * sse4.1: _mm_blendv_ps and _mm_blendv_pd * sse4.1: _mm_blend_ps and _mm_blend_pd - HACK warning: messing with the constify macros - Selecting only one buffer gets optimized away and tests need to take this into account * sse4.1: _mm_blend_epi16 * sse4.1: _mm_extract_ps * sse4.1: _mm_extract_epi8 * see4.1: _mm_extract_epi32 * sse4.1: _mm_extract_epi64 * sse4.1: _mm_insert_ps * sse4.1: _mm_insert_epi8 * sse4.1: _mm_insert_epi32 and _mm_insert_epi64 * Formmating * sse4.1: _mm_max_epi8, _mm_max_epu16, _mm_max_epi32 and _mm_max_epu32 * Fix wrong compiler flag - avx -> sse4.1 * Fix intrinsics that only work with x86-64 * sse4.1: use appropriate types * Revert '_mm_extract_ps' to return i32 * sse4.1: Use the v128 types for consistency * Try fix for windows * Try "vectorcall" calling convention * Revert "Try "vectorcall" calling convention" This reverts commit 12936e9976bc6b0e4e538d82f55f0ee2d87a7f25. * Revert "Try fix for windows" This reverts commit 9c473808d334acedd46060b32ceea116662bf6a3. * Change tests for windows * Remove useless Windows test	2017-10-18 11:34:51 -04:00
jneem	acf919f960	avx2: _mm_blend_epi32 (#127 )	2017-10-17 10:16:15 -04:00
Thomas Schilling	64c7f7ec56	Add SSE _mm_store* intrinsics and _mm_move_ss (#115 ) * Add _mm_store* intrinsics and _mm_move_ss * Fix Win64 & Linux i586 failures * Make i586 codegen happy without breaking x86_64	2017-10-17 10:15:37 -04:00
gwenn	19e7d0ed3e	Avx (#126 ) * avx: _mm256_zextps128_ps256 * avx: _mm256_zextpd128_pd256 * avx: _mm256_set_m128 * avx: _mm256_set_m128d * avx: _mm256_castpd_ps * avx: _mm256_castps_pd * avx: _mm256_castps_si256 * avx: _mm256_castsi256_ps * avx: _mm256_zextsi128_si256 * avx: _mm256_set_m128i	2017-10-16 18:14:09 -04:00
pythoneer	3286bbbab7	fixed _mm_set_pd and _mm_setr_pd by reversing order (#124 )	2017-10-16 11:32:26 -04:00
gwenn	db8831ac61	Avx (#123 ) * avx: _mm256_movedup_pd * avx: _mm256_lddqu_si256 * avx: _mm256_rcp_ps * avx: _mm256_rsqrt_ps * avx: _mm256_unpackhi_pd * avx: _mm256_unpackhi_ps * avx: _mm256_unpacklo_pd, _mm256_unpacklo_ps * avx: _mm256_testz_si256 * avx: _mm256_testc_si256 * avx: _mm256_testz_pd * avx: _mm256_testc_pd * avx: _mm256_testnzc_pd * avx: _mm_testz_pd * avx: _mm_testc_pd * avx: _mm_testnzc_pd * avx: _mm256_testz_ps, _mm256_testc_ps, _mm256_testnzc_ps * avx: _mm_testz_ps, _mm_testc_ps, _mm_testnzc_ps * avx: _mm256_movemask_pd, _mm256_movemask_ps * avx: _mm256_setzero_pd, _mm256_setzero_ps * avx: _mm256_setzero_si256 * avx: _mm256_set_pd, _mm256_set_ps * avx: _mm256_set_epi8 * avx: _mm256_set_epi16 * avx: _mm256_set_epi32 * avx: _mm256_set_epi64x * avx: _mm256_setr_pd, _mm256_setr_ps * avx: _mm256_setr_epi8 * avx: _mm256_setr_epi16 * avx: _mm256_setr_epi32, _mm256_setr_epi64x * avx: add missing assert_instr * avx: _mm256_set1_pd * avx: _mm256_set1_ps * avx: _mm256_set1_epi8 * avx: _mm256_set1_epi16, _mm256_set1_epi32 * avx: _mm256_set1_epi64x * avx: _mm256_castpd_si256, _mm256_castsi256_pd, _mm256_castps256_ps128, _mm256_castpd256_pd128, _mm256_castsi256_si128 * avx: remove assert_instr failing	2017-10-15 11:36:46 -04:00
gnzlbg	bd7990eb2a	[arm] v6/v7/v8 run-time tests (#119 )	2017-10-15 09:48:06 -04:00
pythoneer	c38ea28d5a	Sse2 (#122 ) * added _mm_cvtps_pd * added _mm_set_sd * added _mm_set1_pd * added _mm_set_pd1 * added _mm_set_pd * added _mm_setr_pd * added _mm_setzero_pd	2017-10-14 19:50:03 -04:00
gwenn	90c0c9be20	Avx (#109 ) * avx: _mm256_loadu_pd * avx: _mm256_storeu_pd * avx: _mm256_loadu_ps * avx: _mm256_storeu_ps * avx: fix _mm256_storeu_pd and _mm256_storeu_ps * avx: _mm256_loadu_si256 * avx: _mm256_undefined_si256 * avx: _mm256_maskload_pd * avx: _mm256_maskstore_pd * Attempt to fix CI (#108) Need to bring codegen units back to only one for now * [x86] sse4.2 add docs for _SIDD_EQUAL_RANGES (#107) - Add docs for the _SIDD_EQUAL_RANGES mode * Add _MM_TRANSPOSE4_PS pseudo-macro. (#106) This adds a strange macro, which I've replaced with a function, because it seems there are not many better alternatives. Also adds a test, and `#[allow(non_snake_case)]` to `#[simd_test]`. * Fix i586 tests * Implement bitwise SSE ops & _mm_cmp_ss (#103) Add _mm_{and,andnot,or,xor}_ps * Add _mm_cmpeq_ss * Add _mm_cmplt_ss * Add _mm_cmple_ss * Add _mm_cmpgt_ss * Add _mm_cmpge_ss * Add _mm_cmpneq_ss * Add _mm_cmpnlt_ss * Add _mm_cmpnle_ss * Add _mm_cmpngt_ss * Add _mm_cmpnge_ss * Add _mm_cmpord_ss * Add _mm_cmpunord_ss * Fix _mm_{and,andnot,or,xor}_ps tests for i586 LLVM for i586 doesn't seem to generate `andps`, and instead generates 4 `and`s. Similar for the other operations. * avx: _mm_maskload_pd * avx: _mm_maskstore_pd * avx: _mm256_maskload_ps * avx: _mm256_maskstore_ps * avx: _mm_maskload_ps, _mm_maskstore_ps * avx: _mm256_movehdup_ps * avx: _mm256_moveldup_ps	2017-10-14 10:12:57 -04:00
pythoneer	4aa889fa67	Sse2 (#116 ) * added _mm_cvtsd_si64 * added _mm_cvttsd_si64; target_arch to _mm_cvtsd_si64 test	2017-10-14 10:11:25 -04:00
Alex Crichton	082b097d8f	Ignore another test for nightly Wait until rust-lang/rust#45202 is in nightly	2017-10-14 07:10:42 -07:00
Thomas Schilling	05b045746a	SSE Comparison instructions (#111 ) * Add _mm_cmp_ps variant (SSE) Add _mm_comi{eq,lt,le,gt,ge,neq}_ss instructions (sse) * Add _mm_ucomi_ss instructions SSE They all compile down to the same x86 instruction, UCOMISS, whereas the _mm_comi_ss instructions compile down to COMISS. The outputs of both sets of instructions are exactly the same. The only difference is in exception handling. I therefore added a single test case which tests their different effect on the MXCSR register (_mm_getcsr) of _mm_comieq_ss vs. _mm_ucomieq_ss. Together with the tests about emitting the right instruction, no tests further tests are needed for the other variants. * Avoid constant-folding test case	2017-10-12 13:47:21 -04:00
Thomas Schilling	9b0295c0f8	Implement bitwise SSE ops & _mm_cmp_ss (#103 ) Add _mm_{and,andnot,or,xor}_ps * Add _mm_cmpeq_ss * Add _mm_cmplt_ss * Add _mm_cmple_ss * Add _mm_cmpgt_ss * Add _mm_cmpge_ss * Add _mm_cmpneq_ss * Add _mm_cmpnlt_ss * Add _mm_cmpnle_ss * Add _mm_cmpngt_ss * Add _mm_cmpnge_ss * Add _mm_cmpord_ss * Add _mm_cmpunord_ss * Fix _mm_{and,andnot,or,xor}_ps tests for i586 LLVM for i586 doesn't seem to generate `andps`, and instead generates 4 `and`s. Similar for the other operations.	2017-10-12 10:15:10 -04:00
Alex Crichton	9a440a3eb0	Fix i586 tests	2017-10-11 17:33:41 -07:00
Malo Jaffré	5a028d329e	Add _MM_TRANSPOSE4_PS pseudo-macro. (#106 ) This adds a strange macro, which I've replaced with a function, because it seems there are not many better alternatives. Also adds a test, and `#[allow(non_snake_case)]` to `#[simd_test]`.	2017-10-11 11:28:44 -04:00
Dan Robertson	5a6005aa29	[x86] sse4.2 add docs for _SIDD_EQUAL_RANGES (#107 ) - Add docs for the _SIDD_EQUAL_RANGES mode	2017-10-11 11:28:17 -04:00
Alex Crichton	9da400965f	Attempt to fix CI (#108 ) Need to bring codegen units back to only one for now	2017-10-11 11:28:02 -04:00
gwenn	7c88f7c49b	Avx (#105 ) * avx: _mm_permute_ps and sse: _mm_undefined_ps * avx: _mm256_permutevar_pdi, _mm_permutevar_pd * avx: _mm256_permute_pd * avx: _mm256_shuffle_pd fixed * avx: _mm_permute_pd, sse2: _mm_undefined_pd * avx: _mm256_permute2f128_ps * avx: _mm256_permute2f128_pd * avx: _mm256_permute2f128_si256 * avx: _mm256_broadcast_ss * avx: _mm_broadcast_ss * avx: _mm256_broadcast_sd * avx: _mm256_broadcast_ps * avx: _mm256_broadcast_pd * avx: _mm_cmp_pd * avx: _mm256_cmp_pd * avx: _mm_cmp_ps * avx: _mm256_cmp_ps * avx: _mm_cmp_sd * avx: _mm_cmp_ss * avx: _mm256_insertf128_pd, _mm256_castpd128_pd256 * avx: _mm256_insertf128_si256, _mm256_castsi128_si256 * avx: _mm256_insertf128_ps, _mm256_castps128_ps256 * avx: _mm256_insert_epi8 * avx: _mm256_insert_epi16 * avx: _mm256_insert_epi32 * avx: _mm256_insert_epi64 * Try to fix i586 build * Fix missing inline and target_feature * sse: fix _mm_undefined_ps	2017-10-09 16:05:36 -05:00
Thomas Schilling	807ec089b7	Implement SSE _mm_load* instructions (#99 ) * Add _mm_loadh_pi * Add doctest for _mm_loadh_pi * Add _mm_loadl_pi * Add _mm_load_ss * Add _mm_load1_ps and _mm_load_ps1 * Add _mm_load_ps and _mm_loadu_ps * Add _mm_loadr_ps * Replace _mm_loadu_ps TODO with explanation * Tweak expected instructions for _mm_loadl/h_pi on x86 * Try fixing i586 test crash * Targets i586/i686 generate different code for _mm_loadh_pi	2017-10-07 21:12:47 -05:00
Thomas Schilling	a547f2bf36	Implement SSE _mm_set* intrinsics (#100 ) * Add _mm_set_ss * Add _mm_set1_ps and _mm_set_ps1 * Add _mm_set_ps * Add _mm_setr_ps * Add _mm_setzero_ps * Fix _mm_setr_ps instr test on x86 * Sidestep black_box ABI issue on i586	2017-10-07 15:04:55 +00:00
Alex Crichton	7055f496c7	Add an i586 builder (#101 ) The i586 targets on x86 are defined to be 32-bit and lacking in sse/sse2 unlike the i686 target which has sse2 turned on by default. I was mostly curious what would happen when turning on this target, and it turns out quite a few tests failed! Most of the tests here had to do with calling functions with ABI mismatches where the callee wasn't `#[inline(always)]`. Various pieces have been updated now and we should be passing all tests. Only one instruction assertion ended up changing where the function generates a different instruction with sse2 ambiently enabled and without it enabled.	2017-10-06 22:54:18 +00:00
Alex Crichton	40eeae6adf	Enable multiple #[assert_instr] attributes (#96 ) * Enable multiple #[assert_instr] attributes Looks like all we needed to do was generate new function names! * Uncomment assertions for `_mm_prefetch`	2017-10-06 21:19:14 +00:00
gwenn	ee0f165e8e	Avx (#90 ) * avx: _mm256_andnot_pd, _mm256_andnot_ps * avx: _mm256_blendv_pd * avx: _mm256_blend_pd with no assert_instr With assert_instr: too many instructions in the disassembly * avx: _mm256_blendv_ps * avx: _mm256_hadd_pd * avx: _mm256_hadd_ps * avx: _mm256_hsub_pd * avx: _mm256_hsub_ps * avx: _mm256_xor_pd * avx: _mm256_xor_ps * avx: _mm256_cvtepi32_pd * avx: _mm256_cvtepi32_ps * avx: _mm256_cvtpd_ps * avx: _mm256_cvtps_epi32 * avx: _mm256_cvtps_pd * avx: _mm256_cvttpd_epi32 * avx: _mm256_cvtpd_epi32 * avx: replace simd_cast by proper instrunction * avx: _mm256_cvttps_epi32 * avx: _mm256_extractf128_ps, _mm256_undefined_ps * avx: _mm256_extractf128_pd, _mm256_undefined_pd * avx: _mm256_extractf128_si256, _mm256_undefined_si256 * avx: _mm256_extract_epi8 * avx: _mm256_extract_epi16 * avx: _mm256_extract_epi32 * avx: _mm256_extract_epi64 * avx: _mm256_zeroall * avx: _mm256_zeroupper * avx: _mm256_permutevar_ps * avx: _mm_permutevar_ps * avx: replace simd_cast by as_* * avx: _mm256_permute_ps * avx: _mm256_dp_ps * avx: _mm256_shuffle_pd * avx: _mm256_shuffle_pd, wrong instruction generated * implement _mm256_hadd_ps and _mm256_hadd_pd * avx: implement _mm256_hsub_pd and _mm256_hsub_ps * assert_instr: raise the limit up to 30 instructions	2017-10-05 13:42:29 -05:00
Dan Robertson	b421e9210c	[Docs] Add more docs to the sse4.2 cmpstr fns (#94 ) - Add more examples to _mm_cmpistri - Add basic docs to _mm_cmpestri - Cleanup lib docs	2017-10-05 18:26:40 +02:00
Thomas Schilling	186b8fe093	Implement _mm_getcsr, _mm_setcsr, _mm_sfence (#88 ) * Add _mm_sfence * Add _mm_getcsr/_mm_setcsr and convenience wrappers * Use test::black_box to simplify tests * Use uppercase naming for C-macro equivalents Discussed at https://github.com/rust-lang-nursery/stdsimd/issues/84	2017-10-05 18:17:43 +02:00
Thomas Schilling	c845a1baaf	Implement _mm_prefetch (#78 ) This boils down to using LLVMs `prefetch` intrinsic [1]. [1]: https://llvm.org/docs/LangRef.html#llvm-prefetch-intrinsic	2017-10-05 18:08:58 +02:00
Adam Niederer	9695f2cfaf	Improve _mm256_round_* docs (#93 ) Fix a grammatical error, use a list instead of using a code block or nothing, and add the LLVM immediate reference.	2017-10-05 00:25:18 +02:00
pythoneer	9a6176723b	added _mm_cvttps_epi32 (#89 )	2017-10-04 11:16:53 +02:00
Dan Robertson	c1da3bad76	[Docs] Improve documentation (#87 ) - Add "How to write and example" section to CONTRIBUTING.md - Add a basic example using `target_feature` to the main page - Improve documentation of SSE 4.2 - Improve documentation of constants - Improve documentation of _mm_cmpistri	2017-10-04 11:15:39 +02:00
gwenn	3202558c98	avx2: _mm256_alignr_epi8	2017-09-30 11:27:15 -04:00
gwenn	be7f29da03	Fix rustdoc	2017-09-30 11:27:15 -04:00
gwenn	d1dff51d90	ssse3: _mm_alignr_epi8	2017-09-30 11:27:15 -04:00
Dustin Bensing	fa2e02af28	added _mm_cvtsd_si32, _mm_cvtsd_ss, _mm_cvtss_sd, _mm_cvttpd_epi32, _mm_cvttsd_si32	2017-09-30 11:27:05 -04:00
Dan Robertson	7a75303aec	[x86] Implement sse4.2 crc32 functions - Implement - _mm_crc32_u8 - _mm_crc32_u16 - _mm_crc32_u32 - _mm_crc32_u64 - _mm_cmpgt_epi64	2017-09-30 09:53:34 -04:00
gwenn	b6a3bc42b3	Remove some failing assert_instr	2017-09-30 09:13:18 -04:00
gwenn	f0f5108a98	sse3: _mm_loaddup_pd and sse2: _mm_load1_pd	2017-09-30 09:13:18 -04:00
gwenn	4cbb838e2e	sse3: _mm_moveldup_ps	2017-09-30 09:13:18 -04:00
gwenn	261534cb0f	sse3: _mm_movehdup_ps	2017-09-30 09:13:18 -04:00
gwenn	8e07404403	sse3: _mm_movedup_pd	2017-09-30 09:13:18 -04:00
gwenn	e4ffcb6fdd	sse3: _mm_hsub_ps	2017-09-30 09:13:18 -04:00
gwenn	d81d0a4a67	sse3: _mm_hsub_pd	2017-09-30 09:13:18 -04:00
gwenn	7f84607f16	sse3: _mm_hadd_ps	2017-09-30 09:13:18 -04:00

... 45 46 47 48 49 ...

2485 Commits