Commit Graph

2629 Commits

Author SHA1 Message Date
Kajetan Puchalski
dfc5dfc8ef std_detect: Add aarch64/linux/LLVM features
Add detection for various aarch64 CPU features already supported by LLVM and Linux.

This commit adds feature detection for the following features:

- FEAT_CSSC
- FEAT_ECV
- FEAT_FAMINMAX
- FEAT_FLAGM2
- FEAT_FP8
- FEAT_FP8DOT2
- FEAT_FP8DOT4
- FEAT_FP8FMA
- FEAT_HBC
- FEAT_LSE128
- FEAT_LUT
- FEAT_MOPS
- FEAT_LRCPC3
- FEAT_SVE_B16B16
- FEAT_SVE2p1
- FEAT_WFxT

It also adds feature detection for FEAT_FPMR. It is somewhat of a
special case because FPMR only exists as a feature in LLVM 18, it has
been removed from the LLVM upstream. On that account the intention is
for it to be detectable at runtime through stdarch but not have a
corresponding compile-time Rust target feature.

Linux features: https://github.com/torvalds/linux/blob/master/arch/arm64/include/uapi/asm/hwcap.h
LLVM features: llvm-project/llvm/lib/Target/AArch64/AArch64.td
2024-07-25 15:18:37 +01:00
sayantn
aa84427fd4 Use LLVM intrinsics for masked load/stores, expand-loads and fp-class
Also, remove some redundant sse target-features from avx intrinsics
2024-07-14 20:26:09 +01:00
daxpedda
ba9e8be05e Revert "wasm32: Add simd128 to enabled features for relaxed intrinsics" 2024-07-14 12:00:23 +02:00
sayantn
aa001c3f3e Some small refactorings
Use llvm intrinsics for `vfpclassss` and `vfpclasssd`
Use `simd_insert` for `x86_polyfill`
2024-07-12 18:12:30 +02:00
Alex Crichton
bb2b4293b9 wasm32: Add simd128 to enabled features for relaxed intrinsics
It looks like LLVM requires that `simd128` is active to use these
intrinsics and `relaxed-simd` isn't implicitly enabling them. This is
probably something to fix at the LLVM layer as well but for now enable
both the `simd128` feature as well as the `relaxed-simd` feature to fix
things on our side.
2024-07-11 17:26:52 +02:00
sayantn
f101974941 Added verification for doc comments 2024-07-08 00:32:43 +02:00
sayantn
1e8a22c374 Fix Documentation 2024-07-08 00:32:43 +02:00
sayantn
1da646fcab Implement missing in SSE4a and TBM
Add `extracti`, `inserti` and `bextri` intrinsics. Refactor TBM into 2 modules
2024-07-07 19:55:04 +02:00
Tobias Decking
7378b35fd0 Use generic simd in wasm intrinsics 2024-07-07 19:21:10 +02:00
sayantn
94153c46e9 Implemented runtime detection of xop target-feature 2024-07-06 18:55:26 +02:00
sayantn
d67ca1fe09 Added runtime detection
Cannot do a `cupid` test because they don't support `amx`.
2024-07-06 18:28:25 +02:00
Tobias Decking
bbb2ba5424 Refactor avx512bw: reduction operations 2024-07-06 12:07:29 +02:00
Tobias Decking
fe0a378499 Refactor avx512bw: mask operations 2024-07-06 12:07:29 +02:00
Tobias Decking
198a91e5db Refactor avx512bw: integer comparison 2024-07-06 12:07:29 +02:00
Tobias Decking
f1a1ec2921 Refactor avx512bw: max/min 2024-07-06 12:07:29 +02:00
Tobias Decking
9ad2a62245 Refactor avx512bw: saturating arithmetic 2024-07-06 12:07:29 +02:00
Tobias Decking
13063410dd Refactor avx512bw: avg + mulhi + abs 2024-07-06 12:07:29 +02:00
sayantn
268ac7fe92 Add detection for SHA512, SM3 and SM4
Cannot cross-verify with `cupid` because they do not have these features yet.
2024-07-06 11:29:28 +02:00
sayantn
c862e4e487 Added a bf16 type 2024-07-06 11:00:34 +02:00
sayantn
70fbc2e97c Implemented some missing functions
These cannot be linked with LLVM because of the lack of `bfloat16` and `i1` types in Rust. So, inline asm was the only way
2024-07-06 11:00:34 +02:00
sayantn
3de8e86491 Implemented the missing AVX512BF16 intrinsics 2024-07-06 11:00:34 +02:00
sayantn
f22fab559e Implemented VEX versions
Modified stdarch-test to accept VEX versions
2024-07-06 11:00:34 +02:00
sayantn
775dcaabde Implemented missing gather-scatters 2024-07-06 11:00:34 +02:00
sayantn
1c3b3b80c0 Fix the stream intrinsics
They should use a platform-specific address management.
2024-07-06 11:00:34 +02:00
Tobias Decking
1f3264848f Fix incorrect reduction operations in avx512f 2024-07-02 12:19:20 +02:00
sayantn
ed1df99f03 Added support for AMD verification
Added a custom cpuid file for sde, which enables SSE4a, XOP, TBM and VP2INTERSECT. Fixed `xsave` tests
2024-06-30 21:45:56 +02:00
sayantn
fd948ee99d Updates SDE
Updated SDE to v9.33.0
Disabled `assert-instr` in emulated run
2024-06-30 21:45:56 +02:00
Tobias Decking
fcee4d8b16 Define remaining IFMA intrinsics 2024-06-30 15:47:18 +02:00
Tobias Decking
a56cc86a23 Use generic simd for avx512 leading zeros 2024-06-30 15:17:50 +02:00
Tobias Decking
d1004e0abd Refactor avx512f: mask operations 2024-06-30 14:55:25 +02:00
Tobias Decking
9f96670b7c Refactor avx512f: element extraction 2024-06-30 14:55:25 +02:00
Tobias Decking
9a1d758f03 Refactor avx512f: floating point abs 2024-06-30 14:55:25 +02:00
Tobias Decking
2c81a7ae33 Refactor avx512f: zeroing primitives 2024-06-30 14:55:25 +02:00
Tobias Decking
f5219be7ee Refactor avx512f: integer comparison 2024-06-30 14:55:25 +02:00
Tobias Decking
883cedc230 Refactor avx512f: integers 2024-06-30 14:55:25 +02:00
Tobias Decking
0d9520dfd4 Refactor avx512f: sqrt + rounding fix 2024-06-30 14:55:25 +02:00
Tobias Decking
53ca30a4c8 Refactor avx512f: rounding fma 2024-06-30 14:55:25 +02:00
Tobias Decking
128866c97b Refactor avx512f: fma 2024-06-30 14:55:25 +02:00
Jubilee Young
8b77e779cb Remove has_cpuid 2024-06-29 19:38:42 +02:00
sayantn
d7ea407a28 Fixing CI
Fixed x86_64-apple-darwin freezing.
Bump all docker to Ubuntu-24.04 (except for emulated and armv7)
2024-06-29 19:16:48 +02:00
sayantn
818df2f7d0 Some fixes as asked by @Amanieu 2024-06-29 19:16:48 +02:00
sayantn
95d273aaf9 Fixed _mm512_kunpackb, reduce-max and reduce-min
`_mm512_kunpackb` was implemented wrong, and `simd_reduce_max` uses `maxnum` for comparison, which adheres to IEEE754, but Intel specifically says that they do NOT adhere to IEEE754 for NaNs, which can give wrong results
2024-06-29 19:16:48 +02:00
sayantn
b3e96f2584 Update CI to accommodate for windows-gnu targets 2024-06-29 19:16:48 +02:00
sayantn
fa22a9aeda Add the missing BMI1, SSE2, SSE4.1 and AVX2 intrinsics 2024-06-29 19:16:48 +02:00
sayantn
d65d1a8ae6 Fixed some more intrinsics
Added some tests, Fixed incorrect target-features, and verification code for target-features. Removed all MMX support from verification.
2024-06-29 19:16:48 +02:00
sayantn
ad7cf91833 Fixed many intrinsics
fixed reduce-add and reduce-mul. and load/store of mask32 and mask64. added preserves-flags to mov asm. fixed the missing list. fixed `_mm_loadu_si64`. Added `assert_instr`
2024-06-29 19:16:48 +02:00
sayantn
043f3cc280 Upgraded disassembly to include windows-gnu targets 2024-06-29 19:16:48 +02:00
sayantn
d26d3a7481 Update Intrinsics list
Updated the intrinsics list from version 3.4 to 3.6.8. Added a missing-x86.md file to track progress.
2024-06-29 19:16:48 +02:00
Mathilda
e2d9ac5145 Fix documentation of arguments of function core::arch::x86::_mm_blendv_epi8 2024-06-27 16:37:53 +02:00
Jayesskay
8e3abdc290 Fix _mm256_bsrli_epi128 producing invalid lower lane when IMM8 = 15 2024-06-27 16:03:46 +02:00