Commit Graph

105 Commits

Author SHA1 Message Date
gnzlbg
7a6a23d818 fix tidy 2018-05-24 16:04:39 +02:00
gnzlbg
f8f204c0bf add simd float intrinsics and gather/scatter 2018-05-24 16:04:39 +02:00
Simonas Kazlauskas
6d5bf8b23f Remove the intrinsic for align_offset
Keep only the language item. This removes some indirection and makes
codegen worse for debug builds, but simplifies code significantly, which
is a good tradeoff to make, in my opinion.

Besides, the codegen can be improved even further with some constant
evaluation improvements that we expect to happen in the future.
2018-05-17 23:13:42 +03:00
Simonas Kazlauskas
d45378216b Change align_offset to support different strides
This is necessary if we want to implement `[T]::align_to` and is more
useful in general.

This implementation effort has begun during the All Hands and represents
a month of my futile efforts to do any sort of maths. Luckily, I
found the very very nice Chris McDonald (cjm) on IRC who figured out the
core formulas for me! All the thanks for existence of this PR go to
them!

Anyway… Those formulas were mangled by yours truly into the arcane forms
you see here to squeeze out the best assembly possible on most of the
modern architectures (x86 and ARM were evaluated in practice). I mean,
just look at it: *one actual* modulo operation and everything else is
just the cheap single cycle ops! Admitedly, the naive solution might be
faster in some common scenarios, but this code absolutely butchers the
naive solution on the worst case scenario.

Alas, the result of this arcane magic also means that the code pretty
heavily relies on the preconditions holding true and breaking those
preconditions will unleash the UB-est of all UBs! So don’t.
2018-05-17 22:46:02 +03:00
Irina Popa
b63d7e2b1c Rename trans to codegen everywhere. 2018-05-17 15:08:30 +03:00