With the stage0 refactor the proc_macro version found in the sysroot
will no longer always match the proc_macro version that proc-macros get
compiled with by the rustc executable that uses this proc_macro. This
will cause problems as soon as the ABI of the bridge gets changed to
implement new features or change the way existing features work.
To fix this, this commit changes rustc crates to depend directly on the
local version of proc_macro which will also be used in the sysroot that
rustc will build.
By replacing them with `{Open,Close}{Param,Brace,Bracket,Invisible}`.
PR #137902 made `ast::TokenKind` more like `lexer::TokenKind` by
replacing the compound `BinOp{,Eq}(BinOpToken)` variants with fieldless
variants `Plus`, `Minus`, `Star`, etc. This commit does a similar thing
with delimiters. It also makes `ast::TokenKind` more similar to
`parser::TokenType`.
This requires a few new methods:
- `TokenKind::is_{,open_,close_}delim()` replace various kinds of
pattern matches.
- `Delimiter::as_{open,close}_token_kind` are used to convert
`Delimiter` values to `TokenKind`.
Despite these additions, it's a net reduction in lines of code. This is
because e.g. `token::OpenParen` is so much shorter than
`token::OpenDelim(Delimiter::Parenthesis)` that many multi-line forms
reduce to single line forms. And many places where the number of lines
doesn't change are still easier to read, just because the names are
shorter, e.g.:
```
- } else if self.token != token::CloseDelim(Delimiter::Brace) {
+ } else if self.token != token::CloseBrace {
```
This involves replacing `nt_pretty_printing_compatibility_hack` with
`stream_pretty_printing_compatibility_hack`.
The handling of statements in `transcribe` is slightly different to
other nonterminal kinds, due to the lack of `from_ast` implementation
for empty statements.
Notable test changes:
- `tests/ui/proc-macro/expand-to-derive.rs`: the diff looks large but
the only difference is the insertion of a single invisible-delimited
group around a metavar.
For consistency with `rustc_lexer::TokenKind::Bang`, and because other
`ast::TokenKind` variants generally have syntactic names instead of
semantic names (e.g. `Star` and `DotDot` instead of `Mul` and `Range`).
`BinOpToken` is badly named, because it only covers the assignable
binary ops and excludes comparisons and `&&`/`||`. Its use in
`ast::TokenKind` does allow a small amount of code sharing, but it's a
clumsy factoring.
This commit removes `ast::TokenKind::BinOp{,Eq}`, replacing each one
with 10 individual variants. This makes `ast::TokenKind` more similar to
`rustc_lexer::TokenKind`, which has individual variants for all
operators.
Although the number of lines of code increases, the number of chars
decreases due to the frequent use of shorter names like `token::Plus`
instead of `token::BinOp(BinOpToken::Plus)`.
The parser pushes a `TokenType` to `Parser::expected_token_types` on
every call to the various `check`/`eat` methods, and clears it on every
call to `bump`. Some of those `TokenType` values are full tokens that
require cloning and dropping. This is a *lot* of work for something
that is only used in error messages and it accounts for a significant
fraction of parsing execution time.
This commit overhauls `TokenType` so that `Parser::expected_token_types`
can be implemented as a bitset. This requires changing `TokenType` to a
C-style parameterless enum, and adding `TokenTypeSet` which uses a
`u128` for the bits. (The new `TokenType` has 105 variants.)
The new types `ExpTokenPair` and `ExpKeywordPair` are now arguments to
the `check`/`eat` methods. This is for maximum speed. The elements in
the pairs are always statically known; e.g. a
`token::BinOp(token::Star)` is always paired with a `TokenType::Star`.
So we now compute `TokenType`s in advance and pass them in to
`check`/`eat` rather than the current approach of constructing them on
insertion into `expected_token_types`.
Values of these pair types can be produced by the new `exp!` macro,
which is used at every `check`/`eat` call site. The macro is for
convenience, allowing any pair to be generated from a single identifier.
The ident/keyword filtering in `expected_one_of_not_found` is no longer
necessary. It was there to account for some sloppiness in
`TokenKind`/`TokenType` comparisons.
The existing `TokenType` is moved to a new file `token_type.rs`, and all
its new infrastructure is added to that file. There is more boilerplate
code than I would like, but I can't see how to make it shorter.
`rustc_span::symbol` defines some things that are re-exported from
`rustc_span`, such as `Symbol` and `sym`. But it doesn't re-export some
closely related things such as `Ident` and `kw`. So you can do `use
rustc_span::{Symbol, sym}` but you have to do `use
rustc_span::symbol::{Ident, kw}`, which is inconsistent for no good
reason.
This commit re-exports `Ident`, `kw`, and `MacroRulesNormalizedIdent`,
and changes many `rustc_span::symbol::` qualifiers in `compiler/` to
`rustc_span::`. This is a 200+ net line of code reduction, mostly
because many files with two `use rustc_span` items can be reduced to
one.
Because `TokenStreamIter` is a much better name for a `TokenStream`
iterator. Also rename the `TokenStream::trees` method as
`TokenStream::iter`, and some local variables.
Currently we have an awkward mix of fallible and infallible functions:
```
new_parser_from_source_str
maybe_new_parser_from_source_str
new_parser_from_file
(maybe_new_parser_from_file) // missing
(new_parser_from_source_file) // missing
maybe_new_parser_from_source_file
source_str_to_stream
maybe_source_file_to_stream
```
We could add the two missing functions, but instead this commit removes
of all the infallible ones and renames the fallible ones leaving us with
these which are all fallible:
```
new_parser_from_source_str
new_parser_from_file
new_parser_from_source_file
source_str_to_stream
source_file_to_stream
```
This requires making `unwrap_or_emit_fatal` public so callers of
formerly infallible functions can still work.
This does make some of the call sites slightly more verbose, but I think
it's worth it for the simpler API. Also, there are two `catch_unwind`
calls and one `catch_fatal_errors` call in this diff that become
removable thanks this change. (I will do that in a follow-up PR.)
Lexing converts source text into a token stream. Parsing converts a
token stream into AST fragments. This commit renames several lexing
operations that have "parse" in the name. I think these names have been
subtly confusing me for years.
This is just a `s/parse/lex/` on function names, with one exception:
`parse_stream_from_source_str` becomes `source_str_to_stream`, to make
it consistent with the existing `source_file_to_stream`. The commit also
moves that function's location in the file to be just above
`source_file_to_stream`.
The commit also cleans up a few comments along the way.
We still check for the `rental`/`allsorts-rental` crates. But now if
they are detected we just emit a fatal error, instead of emitting a
warning and providing alternative behaviour.
The original "hack" implementing alternative behaviour was added
in #73345.
The lint was added in #83127.
The tracking issue is #83125.
The direct motivation for the change is that providing the alternative
behaviour is interfering with #125174 and follow-on work.
The extra span is now recorded in the new `TokenKind::NtIdent` and
`TokenKind::NtLifetime`. These both consist of a single token, and so
there's no operator precedence problems with inserting them directly
into the token stream.
The other way to do this would be to wrap the ident/lifetime in invisible
delimiters, but there's a lot of code that assumes an interpolated
ident/lifetime fits in a single token, and changing all that code to work with
invisible delimiters would have been a pain. (Maybe it could be done in a
follow-up.)
This change might not seem like much of a win, but it's a first step toward the
much bigger and long-desired removal of `Nonterminal` and
`TokenKind::Interpolated`. That change is big and complex enough that it's
worth doing this piece separately. (Indeed, this commit is based on part of a
late commit in #114647, a prior attempt at that big and complex change.)
This span records the declaration of the metavariable in the LHS of the macro.
It's used in a couple of error messages. Unfortunately, it gets in the way of
the long-term goal of removing `TokenKind::Interpolated`. So this commit
removes it, which degrades a couple of (obscure) error messages but makes
things simpler and enables the next commit.
Currently it only checks calls to functions marked with
`#[rustc_lint_diagnostics]`. This commit changes it to check calls to
any function with an `impl Into<{D,Subd}iagMessage>` parameter. This
greatly improves its coverage and doesn't rely on people remembering to
add `#[rustc_lint_diagnostics]`.
The commit also adds `#[allow(rustc::untranslatable_diagnostic)`]
attributes to places that need it that are caught by the improved lint.
These places that might be easy to convert to translatable diagnostics.
Finally, it also:
- Expands and corrects some comments.
- Does some minor formatting improvements.
- Adds missing `DecorateLint` cases to
`tests/ui-fulldeps/internal-lints/diagnostics.rs`.
Existing names for values of this type are `sess`, `parse_sess`,
`parse_session`, and `ps`. `sess` is particularly annoying because
that's also used for `Session` values, which are often co-located, and
it can be difficult to know which type a value named `sess` refers to.
(That annoyance is the main motivation for this change.) `psess` is nice
and short, which is good for a name used this much.
The commit also renames some `parse_sess_created` values as
`psess_created`.
Add newtypes for bool fields/params/return types
Fixed all the cases of this found with some simple searches for `*/ bool` and `bool /*`; probably many more
`Rustc::emit_diagnostic` reconstructs a diagnostic passed in from the
macro machinery. Currently it uses the type `DiagnosticBuilder<'_,
ErrorGuaranteed>`, which is incorrect, because the diagnostic might be a
warning. And if it is a warning, because of the `ErrorGuaranteed` we end
up calling into `emit_producing_error_guaranteed` and the assertion
within that function (correctly) fails because the level is not an error
level.
The fix is simple: change the type to `DiagnosticBuilder<'_, ()>`. Using
`()` works no matter what the diagnostic level is, and we don't need an
`ErrorGuaranteed` here.
The panic was reported in #120576.
Currently many diagnostic modifier methods are available on both
`Diagnostic` and `DiagnosticBuilder`. This commit removes most of them
from `Diagnostic`. To minimize the diff size, it keeps them within
`diagnostic.rs` but changes the surrounding `impl Diagnostic` block to
`impl DiagnosticBuilder`. (I intend to move things around later, to give
a more sensible code layout.)
`Diagnostic` keeps a few methods that it still needs, like `sub`,
`arg`, and `replace_args`.
The `forward!` macro, which defined two additional methods per call
(e.g. `note` and `with_note`), is replaced by the `with_fn!` macro,
which defines one additional method per call (e.g. `with_note`). It's
now also only used when necessary -- not all modifier methods currently
need a `with_*` form. (New ones can be easily added as necessary.)
All this also requires changing `trait AddToDiagnostic` so its methods
take `DiagnosticBuilder` instead of `Diagnostic`, which leads to many
mechanical changes. `SubdiagnosticMessageOp` gains a type parameter `G`.
There are three subdiagnostics -- `DelayedAtWithoutNewline`,
`DelayedAtWithNewline`, and `InvalidFlushedDelayedDiagnosticLevel` --
that are created within the diagnostics machinery and appended to
external diagnostics. These are handled at the `Diagnostic` level, which
means it's now hard to construct them via `derive(Diagnostic)`, so
instead we construct them by hand. This has no effect on what they look
like when printed.
There are lots of new `allow` markers for `untranslatable_diagnostics`
and `diagnostics_outside_of_impl`. This is because
`#[rustc_lint_diagnostics]` annotations were present on the `Diagnostic`
modifier methods, but missing from the `DiagnosticBuilder` modifier
methods. They're now present.
This mostly works well, and eliminates a couple of delayed bugs.
One annoying thing is that we should really also add an
`ErrorGuaranteed` to `proc_macro::bridge::LitKind::Err`. But that's
difficult because `proc_macro` doesn't have access to `ErrorGuaranteed`,
so we have to fake it.
`is_force_warn` is only possible for diagnostics with `Level::Warning`,
but it is currently stored in `Diagnostic::code`, which every diagnostic
has.
This commit:
- removes the boolean `DiagnosticId::Lint::is_force_warn` field;
- adds a `ForceWarning` variant to `Level`.
Benefits:
- The common `Level::Warning` case now has no arguments, replacing
lots of `Warning(None)` occurrences.
- `rustc_session::lint::Level` and `rustc_errors::Level` are more
similar, both having `ForceWarning` and `Warning`.
This works for most of its call sites. This is nice, because `emit` very
much makes sense as a consuming operation -- indeed,
`DiagnosticBuilderState` exists to ensure no diagnostic is emitted
twice, but it uses runtime checks.
For the small number of call sites where a consuming emit doesn't work,
the commit adds `DiagnosticBuilder::emit_without_consuming`. (This will
be removed in subsequent commits.)
Likewise, `emit_unless` becomes consuming. And `delay_as_bug` becomes
consuming, while `delay_as_bug_without_consuming` is added (which will
also be removed in subsequent commits.)
All this requires significant changes to `DiagnosticBuilder`'s chaining
methods. Currently `DiagnosticBuilder` method chaining uses a
non-consuming `&mut self -> &mut Self` style, which allows chaining to
be used when the chain ends in `emit()`, like so:
```
struct_err(msg).span(span).emit();
```
But it doesn't work when producing a `DiagnosticBuilder` value,
requiring this:
```
let mut err = self.struct_err(msg);
err.span(span);
err
```
This style of chaining won't work with consuming `emit` though. For
that, we need to use to a `self -> Self` style. That also would allow
`DiagnosticBuilder` production to be chained, e.g.:
```
self.struct_err(msg).span(span)
```
However, removing the `&mut self -> &mut Self` style would require that
individual modifications of a `DiagnosticBuilder` go from this:
```
err.span(span);
```
to this:
```
err = err.span(span);
```
There are *many* such places. I have a high tolerance for tedious
refactorings, but even I gave up after a long time trying to convert
them all.
Instead, this commit has it both ways: the existing `&mut self -> Self`
chaining methods are kept, and new `self -> Self` chaining methods are
added, all of which have a `_mv` suffix (short for "move"). Changes to
the existing `forward!` macro lets this happen with very little
additional boilerplate code. I chose to add the suffix to the new
chaining methods rather than the existing ones, because the number of
changes required is much smaller that way.
This doubled chainging is a bit clumsy, but I think it is worthwhile
because it allows a *lot* of good things to subsequently happen. In this
commit, there are many `mut` qualifiers removed in places where
diagnostics are emitted without being modified. In subsequent commits:
- chaining can be used more, making the code more concise;
- more use of chaining also permits the removal of redundant diagnostic
APIs like `struct_err_with_code`, which can be replaced easily with
`struct_err` + `code_mv`;
- `emit_without_diagnostic` can be removed, which simplifies a lot of
machinery, removing the need for `DiagnosticBuilderState`.
Because it's redundant w.r.t. `Diagnostic::is_lint`, which is present
for every diagnostic level.
`struct_lint_level_impl` was the only place that set the `Error` field
to `true`, and it's also the only place that calls
`Diagnostic::is_lint()` to set the `is_lint` field.
`Diagnostic` has 40 methods that return `&mut Self` and could be
considered setters. Four of them have a `set_` prefix. This doesn't seem
necessary for a type that implements the builder pattern. This commit
removes the `set_` prefixes on those four methods.
Currently, `emit_diagnostic` takes `&mut self`.
This commit changes it so `emit_diagnostic` takes `self` and the new
`emit_diagnostic_without_consuming` function takes `&mut self`.
I find the distinction useful. The former case is much more common, and
avoids a bunch of `mut` and `&mut` occurrences. We can also restrict the
latter with `pub(crate)` which is nice.