Previously, all words in the (deduplicated) bitset would be stored raw -- a full
64 bits (8 bytes). Now, those words that are equivalent to others through a
specific mapping are stored separately and "mapped" to the original when
loading; this shrinks the table sizes significantly, as each mapped word is
stored in 2 bytes (a 4x decrease from the previous).
The new encoding is also potentially non-optimal: the "mapped" byte is
frequently repeated, as in practice many mapped words use the same base word.
Currently we only support two forms of mapping: rotation and inversion. Note
that these are both guaranteed to map transitively if at all, and supporting
mappings for which this is not true may require a more interesting algorithm for
choosing the optimal pairing.
Updated sizes:
Alphabetic : 2622 bytes (- 414 bytes)
Case_Ignorable : 1803 bytes (- 330 bytes)
Cased : 808 bytes (- 126 bytes)
Cc : 32 bytes
Grapheme_Extend: 1508 bytes (- 252 bytes)
Lowercase : 901 bytes (- 84 bytes)
N : 1064 bytes (- 156 bytes)
Uppercase : 838 bytes (- 96 bytes)
White_Space : 91 bytes (- 6 bytes)
Total table sizes: 9667 bytes (-1,464 bytes)
Rollup of 6 pull requests
Successful merges:
- #69497 (Don't unwind when hitting the macro expansion recursion limit)
- #69901 (add #[rustc_layout(debug)])
- #69910 (Avoid query type in generics)
- #69955 (Fix abort-on-eprintln during process shutdown)
- #70032 (put type params in front of const params in generics_of)
- #70119 (rustc: use LocalDefId instead of DefId in TypeckTables.)
Failed merges:
r? @ghost
rustc: use LocalDefId instead of DefId in TypeckTables.
The logic in `TypeckTables`' implementation of `HashStable`, which created `DefId`s by combining a `CrateNum` from a `DefId` and a `DefIndex` from a `LocalDefId`, bothered me a bit.
I don't know how much this matters, but it works so might as well submit it.
Fix abort-on-eprintln during process shutdown
This commit fixes an issue where if `eprintln!` is used in a TLS
destructor it can accidentally cause the process to abort. TLS
destructors are executed after `main` returns on the main thread, and at
this point we've also deinitialized global `Lazy` values like those
which store the `Stderr` and `Stdout` internals. This means that despite
handling TLS not being accessible in `eprintln!`, we will fail due to
not being able to call `stderr()`. This means that we'll double-panic
quickly because panicking also attempt to write to stderr.
The fix here is to reimplement the global stderr handle to avoid the
need for destruction. This avoids the need for `Lazy` as well as the
hidden panic inside of the `stderr` function.
Overall this should improve the robustness of printing errors and/or
panics in weird situations, since the `stderr` accessor should be
infallible in more situations.
Avoid query type in generics
There are at the moment roughly 170 queries in librustc.
The way ty::query is structured, a lot of code is duplicated for each query.
I suspect this to be responsible for a part of librustc'c compile time.
This PR reduces the amount of code generic on the query,
replacing it by code generic on the key-value types.
This is split out of #69808,
and should not contain the perf regression.
cc #65031
add #[rustc_layout(debug)]
@eddyb recently told me about the `#[rustc_layout]` attribute, and I think it would be very useful if it could be used to print all the layout information Rust has about a type. When working with layouts (e.g. in Miri), it is often not clear how certain surface language features get represented internally. I have some awful hacks locally to be able to dump this debug information; with this attribute I could get it on the playground which is so much better. :)