Sayantan Chakraborty
200905e0e9
Fix _mm_stream_si64
2024-08-03 22:58:47 +01:00
Jonas Fierlings
47068b1a06
Fix markdown list in docs
2024-08-03 19:24:37 +01:00
ziyizhang-1
50cd4ef0c5
initial commit to enable amx
...
AMX Intrinsics:
amx-tile:
- _tile_loadconfig
- _tile_storeconfig
- _tile_loadd
- _tile_release
- _tile_stored
- _tile_stream_loadd
- _tile_zero
amx-int8:
- _tile_dpbssd
- _tile_dpbsud
- _tile_dpbusd
- _tile_dpbuud
amx-bf16:
- _tile_dpbf16ps
amx-fp16
- _tile_dpfp16ps
amx-complex
- _tile_cmmimfp16ps
- _tile_cmmrlfp16ps
2024-08-03 19:02:09 +01:00
sayantn
4a13560ede
Update Intrinsics List to v3.6.9
...
Add `#[inline]` to avx512ifma intrinsics
Fix the test equality.
Remove the stability attributes in simd types and test functions
2024-07-26 12:20:06 +01:00
sayantn
3cf2b7d74f
AVX512FP16 Part 9: Remaining avx512fp16 and avxneconvert
2024-07-26 12:20:06 +01:00
sayantn
318e9ec7e7
AVX512FP16 Part 8: Convert from f16
2024-07-26 12:20:06 +01:00
sayantn
cea6530177
AVX512FP16 Part 7: Convert to f16
2024-07-26 12:20:06 +01:00
sayantn
734355993e
AVX512FP16 Part 6: Remaining
...
`cmpph`, `fpclass`, reduce, `blend`, `permutex`
2024-07-26 12:20:06 +01:00
sayantn
debe317dcf
AVX512FP16 Part 5: FP-Support
...
`getexp`, `getmant`, `roundscale`, `scalef`, `reduce`
2024-07-26 12:20:06 +01:00
sayantn
c024ef206f
AVX512FP16 Part 4: Math functions
...
Reciprocal, RSqrt, Sqrt, Max, Min
2024-07-26 12:20:06 +01:00
sayantn
b88dfd6c03
AVX512FP16 Part 3: FMA
2024-07-26 12:20:06 +01:00
sayantn
7be9f610e3
AVX512_FP16 Part 2: Complex Multiplication
2024-07-26 12:20:06 +01:00
sayantn
60dfe5f264
AVX512FP16 Part 1
...
Add-Sub-Mul-Div, Load-Store-Move, `comi`, `set`
2024-07-26 12:20:06 +01:00
sayantn
c878b773d5
AVX512FP16 Part 0: Types
2024-07-26 12:20:06 +01:00
daxpedda
a1ad6bf8be
Move Wasm's relaxed SIMD to Rust v1.82
2024-07-25 16:38:08 +01:00
sayantn
74f53212a0
Stabilize simd_x86_updates
2024-07-25 16:07:35 +01:00
Yuri Astrakhan
dd87060bf3
Minor lints for stdarch-gen-arm/src/main.rs
...
Just a few minor cleanups
2024-07-25 15:41:21 +01:00
Kajetan Puchalski
351ec5744c
std_detect: Update aarch64 feature dependencies to LLVM upstream
...
Feature dependencies for newer aarch64 fetaures differ between LLVM 18
in the Rust tree and upstream LLVM 19.
This commit updates those dependencies to reflect new LLVM upstream
changes.
2024-07-25 15:18:37 +01:00
Kajetan Puchalski
41dc17d3e5
std_detect: Sort aarch64 features
...
Alphabetically sort the list of aarch64 features.
The list was getting a bit too chaotic so it was worth properly
sorting.
2024-07-25 15:18:37 +01:00
Kajetan Puchalski
ef538bc614
std_detect: Add aarch64/linux/LLVM SME features
...
Add detection for SME features supported by LLVM and the Linux Kernel.
Include commented-out hwcap fields for features supported by Linux but not by LLVM.
This commit adds feature detection for the following features:
- FEAT_SME
- FEAT_SME_F16F16
- FEAT_SME_F64F64
- FEAT_SME_F8F16
- FEAT_SME_F8F32
- FEAT_SME_FA64
- FEAT_SME_I16I64
- FEAT_SME_LUTv2
- FEAT_SME2
- FEAT_SME2p1
- FEAT_SSVE_FP8DOT2
- FEAT_SSVE_FP8DOT4
- FEAT_SSVE_FP8FMA
Linux features: https://github.com/torvalds/linux/blob/master/arch/arm64/include/uapi/asm/hwcap.h
LLVM features: llvm-project/llvm/lib/Target/AArch64/AArch64.td
2024-07-25 15:18:37 +01:00
Kajetan Puchalski
dfc5dfc8ef
std_detect: Add aarch64/linux/LLVM features
...
Add detection for various aarch64 CPU features already supported by LLVM and Linux.
This commit adds feature detection for the following features:
- FEAT_CSSC
- FEAT_ECV
- FEAT_FAMINMAX
- FEAT_FLAGM2
- FEAT_FP8
- FEAT_FP8DOT2
- FEAT_FP8DOT4
- FEAT_FP8FMA
- FEAT_HBC
- FEAT_LSE128
- FEAT_LUT
- FEAT_MOPS
- FEAT_LRCPC3
- FEAT_SVE_B16B16
- FEAT_SVE2p1
- FEAT_WFxT
It also adds feature detection for FEAT_FPMR. It is somewhat of a
special case because FPMR only exists as a feature in LLVM 18, it has
been removed from the LLVM upstream. On that account the intention is
for it to be detectable at runtime through stdarch but not have a
corresponding compile-time Rust target feature.
Linux features: https://github.com/torvalds/linux/blob/master/arch/arm64/include/uapi/asm/hwcap.h
LLVM features: llvm-project/llvm/lib/Target/AArch64/AArch64.td
2024-07-25 15:18:37 +01:00
sayantn
aa84427fd4
Use LLVM intrinsics for masked load/stores, expand-loads and fp-class
...
Also, remove some redundant sse target-features from avx intrinsics
2024-07-14 20:26:09 +01:00
daxpedda
ba9e8be05e
Revert "wasm32: Add simd128 to enabled features for relaxed intrinsics"
2024-07-14 12:00:23 +02:00
sayantn
aa001c3f3e
Some small refactorings
...
Use llvm intrinsics for `vfpclassss` and `vfpclasssd`
Use `simd_insert` for `x86_polyfill`
2024-07-12 18:12:30 +02:00
Alex Crichton
bb2b4293b9
wasm32: Add simd128 to enabled features for relaxed intrinsics
...
It looks like LLVM requires that `simd128` is active to use these
intrinsics and `relaxed-simd` isn't implicitly enabling them. This is
probably something to fix at the LLVM layer as well but for now enable
both the `simd128` feature as well as the `relaxed-simd` feature to fix
things on our side.
2024-07-11 17:26:52 +02:00
sayantn
f101974941
Added verification for doc comments
2024-07-08 00:32:43 +02:00
sayantn
1e8a22c374
Fix Documentation
2024-07-08 00:32:43 +02:00
sayantn
1da646fcab
Implement missing in SSE4a and TBM
...
Add `extracti`, `inserti` and `bextri` intrinsics. Refactor TBM into 2 modules
2024-07-07 19:55:04 +02:00
Tobias Decking
7378b35fd0
Use generic simd in wasm intrinsics
2024-07-07 19:21:10 +02:00
sayantn
94153c46e9
Implemented runtime detection of xop target-feature
2024-07-06 18:55:26 +02:00
sayantn
d67ca1fe09
Added runtime detection
...
Cannot do a `cupid` test because they don't support `amx`.
2024-07-06 18:28:25 +02:00
Tobias Decking
bbb2ba5424
Refactor avx512bw: reduction operations
2024-07-06 12:07:29 +02:00
Tobias Decking
fe0a378499
Refactor avx512bw: mask operations
2024-07-06 12:07:29 +02:00
Tobias Decking
198a91e5db
Refactor avx512bw: integer comparison
2024-07-06 12:07:29 +02:00
Tobias Decking
f1a1ec2921
Refactor avx512bw: max/min
2024-07-06 12:07:29 +02:00
Tobias Decking
9ad2a62245
Refactor avx512bw: saturating arithmetic
2024-07-06 12:07:29 +02:00
Tobias Decking
13063410dd
Refactor avx512bw: avg + mulhi + abs
2024-07-06 12:07:29 +02:00
sayantn
268ac7fe92
Add detection for SHA512, SM3 and SM4
...
Cannot cross-verify with `cupid` because they do not have these features yet.
2024-07-06 11:29:28 +02:00
sayantn
c862e4e487
Added a bf16 type
2024-07-06 11:00:34 +02:00
sayantn
70fbc2e97c
Implemented some missing functions
...
These cannot be linked with LLVM because of the lack of `bfloat16` and `i1` types in Rust. So, inline asm was the only way
2024-07-06 11:00:34 +02:00
sayantn
3de8e86491
Implemented the missing AVX512BF16 intrinsics
2024-07-06 11:00:34 +02:00
sayantn
f22fab559e
Implemented VEX versions
...
Modified stdarch-test to accept VEX versions
2024-07-06 11:00:34 +02:00
sayantn
775dcaabde
Implemented missing gather-scatters
2024-07-06 11:00:34 +02:00
sayantn
1c3b3b80c0
Fix the stream intrinsics
...
They should use a platform-specific address management.
2024-07-06 11:00:34 +02:00
Tobias Decking
1f3264848f
Fix incorrect reduction operations in avx512f
2024-07-02 12:19:20 +02:00
sayantn
ed1df99f03
Added support for AMD verification
...
Added a custom cpuid file for sde, which enables SSE4a, XOP, TBM and VP2INTERSECT. Fixed `xsave` tests
2024-06-30 21:45:56 +02:00
sayantn
fd948ee99d
Updates SDE
...
Updated SDE to v9.33.0
Disabled `assert-instr` in emulated run
2024-06-30 21:45:56 +02:00
Tobias Decking
fcee4d8b16
Define remaining IFMA intrinsics
2024-06-30 15:47:18 +02:00
Tobias Decking
a56cc86a23
Use generic simd for avx512 leading zeros
2024-06-30 15:17:50 +02:00
Tobias Decking
d1004e0abd
Refactor avx512f: mask operations
2024-06-30 14:55:25 +02:00