Add a dedicated length-prefixing method to `Hasher`
This accomplishes two main goals:
- Make it clear who is responsible for prefix-freedom, including how they should do it
- Make it feasible for a `Hasher` that *doesn't* care about Hash-DoS resistance to get better performance by not hashing lengths
This does not change rustc-hash, since that's in an external crate, but that could potentially use it in future.
Fixes#94026
r? rust-lang/libs
---
The core of this change is the following two new methods on `Hasher`:
```rust
pub trait Hasher {
/// Writes a length prefix into this hasher, as part of being prefix-free.
///
/// If you're implementing [`Hash`] for a custom collection, call this before
/// writing its contents to this `Hasher`. That way
/// `(collection![1, 2, 3], collection![4, 5])` and
/// `(collection![1, 2], collection![3, 4, 5])` will provide different
/// sequences of values to the `Hasher`
///
/// The `impl<T> Hash for [T]` includes a call to this method, so if you're
/// hashing a slice (or array or vector) via its `Hash::hash` method,
/// you should **not** call this yourself.
///
/// This method is only for providing domain separation. If you want to
/// hash a `usize` that represents part of the *data*, then it's important
/// that you pass it to [`Hasher::write_usize`] instead of to this method.
///
/// # Examples
///
/// ```
/// #![feature(hasher_prefixfree_extras)]
/// # // Stubs to make the `impl` below pass the compiler
/// # struct MyCollection<T>(Option<T>);
/// # impl<T> MyCollection<T> {
/// # fn len(&self) -> usize { todo!() }
/// # }
/// # impl<'a, T> IntoIterator for &'a MyCollection<T> {
/// # type Item = T;
/// # type IntoIter = std::iter::Empty<T>;
/// # fn into_iter(self) -> Self::IntoIter { todo!() }
/// # }
///
/// use std:#️⃣:{Hash, Hasher};
/// impl<T: Hash> Hash for MyCollection<T> {
/// fn hash<H: Hasher>(&self, state: &mut H) {
/// state.write_length_prefix(self.len());
/// for elt in self {
/// elt.hash(state);
/// }
/// }
/// }
/// ```
///
/// # Note to Implementers
///
/// If you've decided that your `Hasher` is willing to be susceptible to
/// Hash-DoS attacks, then you might consider skipping hashing some or all
/// of the `len` provided in the name of increased performance.
#[inline]
#[unstable(feature = "hasher_prefixfree_extras", issue = "88888888")]
fn write_length_prefix(&mut self, len: usize) {
self.write_usize(len);
}
/// Writes a single `str` into this hasher.
///
/// If you're implementing [`Hash`], you generally do not need to call this,
/// as the `impl Hash for str` does, so you can just use that.
///
/// This includes the domain separator for prefix-freedom, so you should
/// **not** call `Self::write_length_prefix` before calling this.
///
/// # Note to Implementers
///
/// The default implementation of this method includes a call to
/// [`Self::write_length_prefix`], so if your implementation of `Hasher`
/// doesn't care about prefix-freedom and you've thus overridden
/// that method to do nothing, there's no need to override this one.
///
/// This method is available to be overridden separately from the others
/// as `str` being UTF-8 means that it never contains `0xFF` bytes, which
/// can be used to provide prefix-freedom cheaper than hashing a length.
///
/// For example, if your `Hasher` works byte-by-byte (perhaps by accumulating
/// them into a buffer), then you can hash the bytes of the `str` followed
/// by a single `0xFF` byte.
///
/// If your `Hasher` works in chunks, you can also do this by being careful
/// about how you pad partial chunks. If the chunks are padded with `0x00`
/// bytes then just hashing an extra `0xFF` byte doesn't necessarily
/// provide prefix-freedom, as `"ab"` and `"ab\u{0}"` would likely hash
/// the same sequence of chunks. But if you pad with `0xFF` bytes instead,
/// ensuring at least one padding byte, then it can often provide
/// prefix-freedom cheaper than hashing the length would.
#[inline]
#[unstable(feature = "hasher_prefixfree_extras", issue = "88888888")]
fn write_str(&mut self, s: &str) {
self.write_length_prefix(s.len());
self.write(s.as_bytes());
}
}
```
With updates to the `Hash` implementations for slices and containers to call `write_length_prefix` instead of `write_usize`.
`write_str` defaults to using `write_length_prefix` since, as was pointed out in the issue, the `write_u8(0xFF)` approach is insufficient for hashers that work in chunks, as those would hash `"a\u{0}"` and `"a"` to the same thing. But since `SipHash` works byte-wise (there's an internal buffer to accumulate bytes until a full chunk is available) it overrides `write_str` to continue to use the add-non-UTF-8-byte approach.
---
Compatibility:
Because the default implementation of `write_length_prefix` calls `write_usize`, the changed hash implementation for slices will do the same thing the old one did on existing `Hasher`s.
We might want to change the default before stabilizing (or maybe even after), but for getting in the new unstable methods, leave it as-is for now. That way it won't break cargo and such.
This accomplishes two main goals:
- Make it clear who is responsible for prefix-freedom, including how they should do it
- Make it feasible for a `Hasher` that *doesn't* care about Hash-DoS resistance to get better performance by not hashing lengths
This does not change rustc-hash, since that's in an external crate, but that could potentially use it in future.
Fix typo in `offset_from` documentation
Small fix for what I think is a typo in the `offset_from` documentation.
Someone reading this may understand that the distance in bytes is obtained by dividing the distance by `mem::size_of::<T>()`, but here we just want to define "units of T" in terms of bytes (i.e., units of T == bytes / `mem::size_of::<T>()`).
Update `int_roundings` methods from feedback
This updates `#![feature(int_roundings)]` (#88581) from feedback. All methods now take `NonZeroX`. The documentation makes clear that they panic in debug mode and wrap in release mode.
r? `@joshtriplett`
`@rustbot` label +T-libs +T-libs-api +S-waiting-on-review
Include nonexported macro_rules! macros in the doctest target
Fixes#88038
This PR aims to include nonexported `macro_rules!` macros in the doctest target. For more details, please see the above issue.
library/core: Fixes implement of c_uint, c_long, c_ulong
Fixes: aa67016624 ("make memcmp return a value of c_int_width instead of i32")
Introduce c_num_definition to getting the cfg_if logic easier to maintain
Add newlines for easier code reading
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
It improves the performance of iterators using unchecked access when building in incremental mode
(due to the larger CGU count?). It might negatively affect incremental compile times for better runtime results,
but considering that the equivalent `next()` implementations also are `#[inline]` and usually are more complex this
should be ok.
```
./x.py bench library/core -i --stage 0 --test-args bench_trusted_random_access
OLD: 119,172 ns/iter
NEW: 17,714 ns/iter
```
rustdoc: Resolve doc links referring to `macro_rules` items
cc https://github.com/rust-lang/rust/issues/81633
UPD: the fallback to considering *all* `macro_rules` in the crate for unresolved names is not removed in this PR, it will be removed separately and will be run through crater.
Using an obviously-placeholder syntax. An RFC would still be needed before this could have any chance at stabilization, and it might be removed at any point.
But I'd really like to have it in nightly at least to ensure it works well with try_trait_v2, especially as we refactor the traits.
Fix some confusing wording and improve slice-search-related docs
This adds more links between `contains` and `binary_search` because I do think they have some relevant connections. If your (big) slice happens to be sorted and you know it, surely you should be using `[3; 100].binary_search(&5).is_ok()` over `[3; 100].contains(&5)`?
This also fixes the confusing "searches this sorted X" wording which just sounds really weird because it doesn't know whether it's actually sorted. It should be but it may not be. The new wording should make it clearer that you will probably want to sort it and in the same sentence it also mentions the related function `contains`.
Similarly, this mentions `binary_search` on `contains`' docs.
This also fixes some other minor stuff and inconsistencies.
Unstably constify `impl<I: Iterator> IntoIterator for I`
This constifies the default `IntoIterator` implementation under the `const_intoiterator_identity` feature.
Tracking Issue: #90603
No "weird" floats in const fn {from,to}_bits
I suspect this code is subtly incorrect and that we don't even e.g. use x87-style floats in CTFE, so I don't have to guard against that case. A future PR will be hopefully removing them from concern entirely, anyways. But at the moment I wanted to get this rolling because small questions like that one seem best answered by review.
r? `@oli-obk`
cc `@eddyb` `@thomcc`
Fixes: aa67016624 ("make memcmp return a value of c_int_width instead of i32")
Introduce c_num_definition to getting the cfg_if logic easier to maintain
Add newlines for easier code reading
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Add slice::remainder
This adds a remainder function to the Slice iterator, so that a caller can access unused
elements if iteration stops.
Addresses #91733
Make some `usize`-typed masks definitions agnostic to the size of `usize`
Some masks where defined as
```rust
const NONASCII_MASK: usize = 0x80808080_80808080u64 as usize;
```
where it was assumed that `usize` is never wider than 64, which is currently true.
To make those constants valid in a hypothetical 128-bit target, these constants have been redefined in an `usize`-width-agnostic way
```rust
const NONASCII_MASK: usize = usize::from_ne_bytes([0x80; size_of::<usize>()]);
```
There are already some cases where Rust anticipates the possibility of supporting 128-bit targets, such as not implementing `From<usize>` for `u64`.
docs: add link from zip to unzip
The docs for `Iterator::unzip` explain that it is kind of an inverse operation to `Iterator::zip` and guide the reader to the `zip` docs, but the `zip` docs don't let the user know that they can undo the `zip` operation with `unzip`. This change modifies the docs to help the user find `unzip`.
Create (unstable) 2024 edition
[On Zulip](https://rust-lang.zulipchat.com/#narrow/stream/213817-t-lang/topic/Deprecating.20macro.20scoping.20shenanigans/near/272860652), there was a small aside regarding creating the 2024 edition now as opposed to later. There was a reasonable amount of support and no stated opposition.
This change creates the 2024 edition in the compiler and creates a prelude for the 2024 edition. There is no current difference between the 2021 and 2024 editions. Cargo and other tools will need to be updated separately, as it's not in the same repository. This change permits the vast majority of work towards the next edition to proceed _now_ instead of waiting until 2024.
For sanity purposes, I've merged the "hello" UI tests into a single file with multiple revisions. Otherwise we'd end up with a file per edition, despite them being essentially identical.
````@rustbot```` label +T-lang +S-waiting-on-review
Not sure on the relevant team, to be honest.
Stabilize `derive_default_enum`
This stabilizes `#![feature(derive_default_enum)]`, as proposed in [RFC 3107](https://github.com/rust-lang/rfcs/pull/3107) and tracked in #87517. In short, it permits you to `#[derive(Default)]` on `enum`s, indicating what the default should be by placing a `#[default]` attribute on the desired variant (which must be a unit variant in the interest of forward compatibility).
```````@rustbot``````` label +S-waiting-on-review +T-lang