MIR-build: No longer emit assumes in enum-as casting
This just uses the `valid_range` from the backend, so it's duplicating the range metadata that now we include on parameters and loads, and thus no longer seems to be useful -- notably there's no codegen test failures from removing it.
(Because it's using data from the same source as the backend annotations, it doesn't do anything to mitigate things like rust-lang/rust#144388 where the range in the layout is more permissive than the actual possible discriminants. A variant of this that actually checked the discriminants more specifically might be useful, so could potentially be added in future, but I don't think the *current* checks are actually providing value.)
r? mir
Randomly turns out that this
Fixes https://github.com/rust-lang/rust/issues/121097
`rustc_pattern_analysis`: always check that deref patterns don't match on the same place as normal constructors
In rust-lang/rust#140106, deref pattern validation was tied to the `deref_patterns` feature to temporarily avoid affecting perf. However:
- As of rust-lang/rust#143414, box patterns are represented as deref patterns in `rustc_pattern_analysis`. Since they can be used by enabling `box_patterns` instead of `deref_patterns`, it was possible for them to skip validation, resulting in an ICE. This fixes that and adds a regression test.
- External tooling (e.g. rust-analyzer) will also need to validate matches containing deref patterns, which was not possible. This fixes that by making `compute_match_usefulness` validate deref patterns by default.
In order to avoid doing an extra pass for anything with patterns, the second commit makes `RustcPatCtxt` keep track of whether it encounters a deref pattern, so that it only does the check if so. This is purely for performance. If the perf impact of the first commit is negligible and the complexity cost introduced by the second commit is significant, it may be worth dropping the latter.
r? `@Nadrieril`
Introduce `ByteSymbol`
It's like `Symbol` but for byte strings. The interner is now used for both `Symbol` and `ByteSymbol`. E.g. if you intern `"dog"` and `b"dog"` you'll get a `Symbol` and a `ByteSymbol` with the same index and the characters will only be stored once.
The motivation for this is to eliminate the `Arc`s in `ast::LitKind`, to make `ast::LitKind` impl `Copy`, and to avoid the need to arena-allocate `ast::LitKind` in HIR. The latter change reduces peak memory by a non-trivial amount on literal-heavy benchmarks such as `deep-vector` and `tuple-stress`.
`Encoder`, `Decoder`, `SpanEncoder`, and `SpanDecoder` all get some changes so that they can handle normal strings and byte strings.
It's like `Symbol` but for byte strings. The interner is now used for
both `Symbol` and `ByteSymbol`. E.g. if you intern `"dog"` and `b"dog"`
you'll get a `Symbol` and a `ByteSymbol` with the same index and the
characters will only be stored once.
The motivation for this is to eliminate the `Arc`s in `ast::LitKind`, to
make `ast::LitKind` impl `Copy`, and to avoid the need to arena-allocate
`ast::LitKind` in HIR. The latter change reduces peak memory by a
non-trivial amount on literal-heavy benchmarks such as `deep-vector` and
`tuple-stress`.
`Encoder`, `Decoder`, `SpanEncoder`, and `SpanDecoder` all get some
changes so that they can handle normal strings and byte strings.
This change does slow down compilation of programs that use
`include_bytes!` on large files, because the contents of those files are
now interned (hashed). This makes `include_bytes!` more similar to
`include_str!`, though `include_bytes!` contents still aren't escaped,
and hashing is still much cheaper than escaping.
Specifically `TyAlias`, `Enum`, `Struct`, `Union`. So the fields match
the textual order in the source code.
The interesting part of the change is in
`compiler/rustc_hir/src/hir.rs`. The rest is extremely mechanical
refactoring.
This pattern of iterating over scopes and drops occurs multiple times in
this file, with slight variations. All of them use `for` loops except
this one. This commits changes it for consistency.
It's not needed, because `next` and `local` fields uniquely identify the
drop. This is a ~2% speed win on the very large program in #134404, and
it's also a tiny bit simpler.
interpret: add allocation parameters to `AllocBytes`
Necessary for a better implementation of [rust-lang/miri#4343](https://github.com/rust-lang/miri/pull/4343). Also included here is the code from that PR, adapted to this new interface for the sake of example and so that CI can run on them; the Miri changes can be reverted and merged separately, though.
r? `@RalfJung`
`unpretty=thir-tree`: don't require the final expr to be the body's value
Two motivations for this:
- I couldn't find a comment motivating this hard-coding. I can imagine it might be easier to read `unpretty=thir-flat` output if the final expression in the THIR is always the body's value, but if that's the reason, that should be the justification in the source. I can also imagine it's meant to check that all expressions will be visited by the pretty-printer, but the existing check doesn't quite do that either.
- Guard patterns (#129967) contain expressions, so lowering params containing guard patterns may add more expressions to the THIR. Currently a body's params are lowered after its expression, so guard expressions in params would end up last, breaking this. As an alternative, the params could be lowered first (#141356).
This taints the typeck results with errors if a `continue` is found not
pointing to a loop, which fixes an ICE.
A few things were going wrong here. First, since this wasn't caught in
typeck, we'd end up building the THIR and then running liveness lints on
ill-formed HIR. Since liveness assumes all `continue`s point to loops,
it wasn't setting a live node for the `continue`'s destination. However,
the fallback for this was faulty; it would create a new live node to
represent the erroneous state after the analysis's RWU table had already
been built. This would ICE if the new live node was used in operations,
such as merging results from the arms of a match. I've removed this
error-recovery since it was buggy, and we should really catch bad labels
before liveness.
I've also replaced an outdated comment about when liveness lints are
run. At this point, I think the call to `check_liveness` could be moved
elsewhere, but if it can be run when the typeck results are tainted by
errors, it'll need some slight refactoring so it can bail out in that
case. In lieu of that, I've added an assertion.
allow deref patterns to move out of boxes
This adds a case to lower deref patterns on boxes using a built-in deref instead of a `Deref::deref` or `DerefMut::deref_mut` call: if `deref!(inner): Box<T>` is matching on place `place`, the inner pattern `inner` now matches on `*place` rather than a temporary. No longer needing to call a method also means it won't borrow the scrutinee in match arms. This allows for bindings in `inner` to move out of `*place`.
For comparison with box patterns, this uses the same MIR lowering but different THIR. Consequently, deref patterns on boxes are treated the same as any other deref patterns in match exhaustiveness analysis. Box patterns can't quite be implemented in terms of deref patterns until exhaustiveness checking for deref patterns is implemented (I'll open a PR for exhaustiveness soon!).
Tracking issue: #87121
r? ``@Nadrieril``
This allows deref patterns to move out of boxes.
Implementation-wise, I've opted to put the information of whether a
deref pattern uses a built-in deref or a method call in the THIR. It'd
be a bit less code to check `.is_box()` everywhere, but I think this way
feels more robust (and we don't have a `mutability` field in the THIR
that we ignore when the smart pointer's a box). I'm not sure about the
naming (or using `ByRef`), though.
`deref_patterns`: support string and byte string literals in explicit `deref!("...")` patterns
When `deref_patterns` is enabled, this allows string literal patterns to be used where `str` is expected and byte string literal patterns to be used where `[u8]` or `[u8; N]` is expected. This lets them be used in explicit `deref!("...")` patterns to match on `String`, `Box<str>`, `Vec<u8>`, `Box<[u8;N]>`, etc. (as well as to match on slices and arrays obtained through other means). Implementation-wise, this follows up on #138992: similar to how byte string literals matching on `&[u8]` is implemented, this changes the type of the patterns as determined by HIR typeck, which informs const-to-pat on how to translate them to THIR (though strings needed a bit of extra work since we need references to call `<str as PartialEq>::eq` in the MIR lowering for string equality tests).
This PR does not add support for implicit deref pattern syntax (e.g. `"..."` matching on `String`, as `string_deref_patterns` allows). I have that implemented locally, but I'm saving it for a follow-up PR[^1].
This also does not add support for using named or associated constants of type `&str` where `str` is expected (nor likewise with named byte string constants). It'd be possible to add that if there's an appetite for it, but I figure it's simplest to start with literals.
This is gated by the `deref_patterns` feature since it's motivated by deref patterns. That said, its impact reaches outside of deref patterns; it may warrant a separate experiment and feature gate, particularly factoring in the follow-up[^1]. Even without deref patterns, I think there's probably motivation for these changes.
The update to the unstable book added by this will conflict with #140022, so they shouldn't be merged at the same time.
Tracking issue for deref patterns: #87121
r? ``@oli-obk``
cc ``@Nadrieril``
[^1]: The piece missing from this PR to support implicit deref pattern syntax is to allow string literal patterns to implicitly dereference their scrutinees before matching (see #44849). As a consequence, it also makes examples like the one in that issue work (though it's still gated by `deref_patterns`). I can provide more information on how I've implemented it or open a draft if it'd help in reviewing this PR.
I'm removing empty identifiers everywhere, because in practice they
always mean "no identifier" rather than "empty identifier". (An empty
identifier is impossible.) It's better to use `Option` to mean "no
identifier" because you then can't forget about the "no identifier"
possibility.
Some specifics:
- When testing an attribute for a single name, the commit uses the
`has_name` method.
- When testing an attribute for multiple names, the commit uses the new
`has_any_name` method.
- When using `match` on an attribute, the match arms now have `Some` on
them.
In the tests, we now avoid printing empty identifiers by not printing
the identifier in the `error:` line at all, instead letting the carets
point out the problem.
Overhaul `AssocItem`
`AssocItem` has multiple fields that only make sense some of the time. E.g. the `name` can be empty if it's an RPITIT associated type. It's clearer and less error prone if these fields are moved to the relevant `kind` variants.
r? ``@fee1-dead``
Allow const patterns of matches to contain pattern types
Trying to pattern match on a type containing a pattern type will currently fail with an ICE
```rust
error: internal compiler error: compiler/rustc_mir_build/src/builder/matches/test.rs:459:18: invalid type for non-scalar compare: (u32) is 1..
--> src/main.rs:22:5
|
22 | TWO => {}
| ^^^
```
because the compiler tries to generate a MIR `BinOp(Eq)` operation on a pattern type, which is not supported. While we could support that, there are side effects of allowing this (none that would compile, but the compiler would simultaneously think it could `==` pattern types and that it could not because `PartialEq` is not implemented. So instead I change the logic for pattern matching to transmute pattern types to their base type before comparing.
r? ```@BoxyUwU```
cc #123646 ```@scottmcm``` ```@joshtriplett```
`hir::AssocItem` currently has a boolean `fn_has_self_parameter` field,
which is misplaced, because it's only relevant for associated fns, not
for associated consts or types. This commit moves it (and renames it) to
the `AssocKind::Fn` variant, where it belongs.
This requires introducing a new C-style enum, `AssocTag`, which is like
`AssocKind` but without the fields. This is because `AssocKind` values
are passed to various functions like `find_by_ident_and_kind` to
indicate what kind of associated item should be searched for, and having
to specify `has_self` isn't relevant there.
New methods:
- Predicates `AssocItem::is_fn` and `AssocItem::is_method`.
- `AssocItem::as_tag` which converts `AssocItem::kind` to `AssocTag`.
Removed `find_by_name_and_kinds`, which is unused.
`AssocItem::descr` can now distinguish between methods and associated
functions, which slightly improves some error messages.