Introduce ByteSymbol.

It's like `Symbol` but for byte strings. The interner is now used for
both `Symbol` and `ByteSymbol`. E.g. if you intern `"dog"` and `b"dog"`
you'll get a `Symbol` and a `ByteSymbol` with the same index and the
characters will only be stored once.

The motivation for this is to eliminate the `Arc`s in `ast::LitKind`, to
make `ast::LitKind` impl `Copy`, and to avoid the need to arena-allocate
`ast::LitKind` in HIR. The latter change reduces peak memory by a
non-trivial amount on literal-heavy benchmarks such as `deep-vector` and
`tuple-stress`.

`Encoder`, `Decoder`, `SpanEncoder`, and `SpanDecoder` all get some
changes so that they can handle normal strings and byte strings.

This change does slow down compilation of programs that use
`include_bytes!` on large files, because the contents of those files are
now interned (hashed). This makes `include_bytes!` more similar to
`include_str!`, though `include_bytes!` contents still aren't escaped,
and hashing is still much cheaper than escaping.
This commit is contained in:
Nicholas Nethercote
2025-06-02 08:59:29 +10:00
parent ed2d759783
commit 478f8287c0
46 changed files with 449 additions and 269 deletions

View File

@@ -469,8 +469,12 @@ impl<'a> State<'a> {
ast::ExprKind::Lit(token_lit) => {
self.print_token_literal(*token_lit, expr.span);
}
ast::ExprKind::IncludedBytes(bytes) => {
let lit = token::Lit::new(token::ByteStr, escape_byte_str_symbol(bytes), None);
ast::ExprKind::IncludedBytes(byte_sym) => {
let lit = token::Lit::new(
token::ByteStr,
escape_byte_str_symbol(byte_sym.as_byte_str()),
None,
);
self.print_token_literal(lit, expr.span)
}
ast::ExprKind::Cast(expr, ty) => {