Improve siphash performance for longer data
Use `ptr::copy_nonoverlapping` (aka memcpy) to load an u64 from the
byte stream. This is correct for any alignment, and the compiler will
use the appropriate instruction to load the data.
Also contains small tweaks that should benefit hashing short data too,
both the commit that removes a variable and the autovectorization of
the hash state initialization (in SipHash::reset).
Benchmarks show that hashing longer data benefits for the improved word loading.
Before (using benchmarks from the first commit in the PR):
The before benchmark is a bit noisy.
```
test hash::sip::bench_bytes_4 ... bench: 41 ns/iter (+/- 0) = 97 MB/s
test hash::sip::bench_bytes_7 ... bench: 49 ns/iter (+/- 2) = 142 MB/s
test hash::sip::bench_bytes_8 ... bench: 42 ns/iter (+/- 4) = 190 MB/s
test hash::sip::bench_bytes_a_16 ... bench: 57 ns/iter (+/- 14) = 280 MB/s
test hash::sip::bench_bytes_b_32 ... bench: 85 ns/iter (+/- 74) = 376 MB/s
test hash::sip::bench_bytes_c_128 ... bench: 278 ns/iter (+/- 33) = 460 MB/s
test hash::sip::bench_long_str ... bench: 825 ns/iter (+/- 103)
test hash::sip::bench_str_of_8_bytes ... bench: 151 ns/iter (+/- 66)
test hash::sip::bench_str_over_8_bytes ... bench: 59 ns/iter (+/- 3)
test hash::sip::bench_str_under_8_bytes ... bench: 47 ns/iter (+/- 56)
test hash::sip::bench_u32 ... bench: 39 ns/iter (+/- 93) = 205 MB/s
test hash::sip::bench_u32_keyed ... bench: 40 ns/iter (+/- 88) = 200 MB/s
test hash::sip::bench_u64 ... bench: 54 ns/iter (+/- 96) = 148 MB/s
```
After:
```
test hash::sip::bench_bytes_4 ... bench: 41 ns/iter (+/- 3) = 97 MB/s
test hash::sip::bench_bytes_7 ... bench: 48 ns/iter (+/- 0) = 145 MB/s
test hash::sip::bench_bytes_8 ... bench: 35 ns/iter (+/- 1) = 228 MB/s
test hash::sip::bench_bytes_a_16 ... bench: 45 ns/iter (+/- 1) = 355 MB/s
test hash::sip::bench_bytes_b_32 ... bench: 60 ns/iter (+/- 0) = 533 MB/s
test hash::sip::bench_bytes_c_128 ... bench: 161 ns/iter (+/- 5) = 795 MB/s
test hash::sip::bench_long_str ... bench: 514 ns/iter (+/- 5)
test hash::sip::bench_str_of_8_bytes ... bench: 44 ns/iter (+/- 0)
test hash::sip::bench_str_over_8_bytes ... bench: 51 ns/iter (+/- 0)
test hash::sip::bench_str_under_8_bytes ... bench: 52 ns/iter (+/- 6)
test hash::sip::bench_u32 ... bench: 40 ns/iter (+/- 2) = 200 MB/s
test hash::sip::bench_u32_keyed ... bench: 39 ns/iter (+/- 1) = 205 MB/s
test hash::sip::bench_u64 ... bench: 36 ns/iter (+/- 1) = 222 MB/s
```
Many of these have long since reached their stage of being obsolete, so this
commit starts the removal process for all of them. The unstable features that
were deprecated are:
* box_heap
* cmp_partial
* fs_time
* hash_default
* int_slice
* iter_min_max
* iter_reset_fuse
* iter_to_vec
* map_in_place
* move_from
* owned_ascii_ext
* page_size
* read_and_zero
* scan_state
* slice_chars
* slice_position_elem
* subslice_offset
Many of these have long since reached their stage of being obsolete, so this
commit starts the removal process for all of them. The unstable features that
were deprecated are:
* cmp_partial
* fs_time
* hash_default
* int_slice
* iter_min_max
* iter_reset_fuse
* iter_to_vec
* map_in_place
* move_from
* owned_ascii_ext
* page_size
* read_and_zero
* scan_state
* slice_chars
* slice_position_elem
* subslice_offset
If they are ordered v0, v2, v1, v3, the compiler can find just a few
simd optimizations itself.
The new optimization I could observe on x86-64 was using 128 bit
registers for the v = key ^ constant operations in new / reset.
Use `ptr::copy_nonoverlapping` (aka memcpy) to load an u64 from the
byte stream. This is correct for any alignment, and the compiler will
use the appropriate instruction to load the data.
Use unchecked indexing.
This results in a large improvement of throughput (hashed bytes
/ second) for long data. Maximum improvement benches at a 70% increase
in throughput for large values (> 256 bytes) but already values of 16
bytes or larger improve.
Introducing unchecked indexing is motivated to reach as good throughput
as possible. Using ptr::copy_nonoverlapping without unchecked indexing
would land the improvement some 20-30 pct units lower.
We use a debug assertion so that the test suite checks our use of
unchecked indexing.
Macro desugaring of `in PLACE { BLOCK }` into "simpler" expressions following the in-development "Placer" protocol.
Includes Placer API that one can override to integrate support for `in` into one's own type. (See [RFC 809].)
[RFC 809]: https://github.com/rust-lang/rfcs/blob/master/text/0809-box-and-in-for-stdlib.md
Part of #22181
Replaced PR #26180.
Turns on the `in PLACE { BLOCK }` syntax, while leaving in support for the old `box (PLACE) EXPR` syntax (since we need to support that at least until we have a snapshot with support for `in PLACE { BLOCK }`.
(Note that we are not 100% committed to the `in PLACE { BLOCK }` syntax. In particular I still want to play around with some other alternatives. Still, I want to get the fundamental framework for the protocol landed so we can play with implementing it for non `Box` types.)
----
Also, this PR leaves out support for desugaring-based `box EXPR`. We will hopefully land that in the future, but for the short term there are type-inference issues injected by that change that we want to resolve separately.
eq_slice_() used to be a common implementation for two function that
both called it, but of those only eq_slice() is left, so we can as well
directly inline the code.
This commit moves the IR files in the distribution, rust_try.ll,
rust_try_msvc_64.ll, and rust_try_msvc_32.ll into the compiler from the main
distribution. There's a few reasons for this change:
* LLVM changes its IR syntax from time to time, so it's very difficult to
have these files build across many LLVM versions simultaneously. We'll likely
want to retain this ability for quite some time into the future.
* The implementation of these files is closely tied to the compiler and runtime
itself, so it makes sense to fold it into a location which can do more
platform-specific checks for various implementation details (such as MSVC 32
vs 64-bit).
* This removes LLVM as a build-time dependency of the standard library. This may
end up becoming very useful if we move towards building the standard library
with Cargo.
In the immediate future, however, this commit should restore compatibility with
LLVM 3.5 and 3.6.
This commit moves the IR files in the distribution, rust_try.ll,
rust_try_msvc_64.ll, and rust_try_msvc_32.ll into the compiler from the main
distribution. There's a few reasons for this change:
* LLVM changes its IR syntax from time to time, so it's very difficult to
have these files build across many LLVM versions simultaneously. We'll likely
want to retain this ability for quite some time into the future.
* The implementation of these files is closely tied to the compiler and runtime
itself, so it makes sense to fold it into a location which can do more
platform-specific checks for various implementation details (such as MSVC 32
vs 64-bit).
* This removes LLVM as a build-time dependency of the standard library. This may
end up becoming very useful if we move towards building the standard library
with Cargo.
In the immediate future, however, this commit should restore compatibility with
LLVM 3.5 and 3.6.
eq_slice_() used to be a common implementation for two function that
both called it, but of those only eq_slice() is left, so we can as well
directly inline the code.
This makes the primitive descriptions on the front page read properly
as descriptions of types and not of the associated modules.
Having the primitive and module docs derived from the same source
causes problems, primarily that they can't contain hyperlinks
cross-referencing each other.
This crates dedicated private modules in `std` to document the
primitive types, then for all primitives that have a corresponding
module, puts hyperlinks in moth the primitive docs and the module docs
cross-linking each other.
This should help clear up confusion when readers find themselves on
the wrong page.
This also removes all the duplicate `#[doc(primitive)]` tags in various places (especially core), so the core docs will no longer attempt to document the primitives for now. Seems like an acceptable tradeoff to get some cleanup for std.
I'm being constantly bitten by the lack of this implementation.
I'm unsure if there's a reason to avoid these implementations though.
Since we have a "lossy" implementation for both Mutex and RWLock (RWLock {{ locked }}) I don't think there's a big reason for not having a Debug implementation for the atomic types, even if the user can't specify the ordering.
Fixes#23812 by stripping the decoration when desugaring macro doc comments into #[doc] attributes, and detects whether the attribute should be inner or outer style and outputs the appropriate token tree.
Having the primitive and module docs derived from the same source
causes problems, primarily that they can't contain hyperlinks
cross-referencing each other.
This crates dedicated private modules in `std` to document the
primitive types, then for all primitives that have a corresponding
module, puts hyperlinks in moth the primitive docs and the module docs
cross-linking each other.
This should help clear up confusion when readers find themselves on
the wrong page.
And some other outdated language. @echochamber came asking about these docs
on IRC today, and they're a bit weird. I've updated them to be less ambiguous
and use contemporary terminology.
... matching the existing Index impls.
There is no reason not to if String implement DerefMut.
The code removed in `src/librustc/middle/effect.rs` was added in #9750
to prevent things like `s[0] = 0x80` where `s: String`,
but I belive became unnecessary when the Index(Mut) traits were introduced.
This resolves#26845.
I'm not entirely satisfied with the placement of the rounding discussion in the docs for the `Div` and `Rem` traits, but I couldn't come up with anywhere better to put it. Suggestions are welcome.
I didn't add any discussion of rounding to the `checked_div` (or rem) or `wrapping_div` documentation because those seem to make it pretty clear that they do the same thing as `Div`.