`BinOpToken` is badly named, because it only covers the assignable
binary ops and excludes comparisons and `&&`/`||`. Its use in
`ast::TokenKind` does allow a small amount of code sharing, but it's a
clumsy factoring.
This commit removes `ast::TokenKind::BinOp{,Eq}`, replacing each one
with 10 individual variants. This makes `ast::TokenKind` more similar to
`rustc_lexer::TokenKind`, which has individual variants for all
operators.
Although the number of lines of code increases, the number of chars
decreases due to the frequent use of shorter names like `token::Plus`
instead of `token::BinOp(BinOpToken::Plus)`.
It mirrors `ExprKind::Binary`, and contains a `BinOpKind`. This makes
`AssocOp` more like `ExprKind`. Note that the variants removed from
`AssocOp` are all named differently to `BinOpToken`, e.g. `Multiply`
instead of `Mul`, so that's an inconsistency removed.
The commit adds `precedence` and `fixity` methods to `BinOpKind`, and
calls them from the corresponding methods in `AssocOp`. This avoids the
need to create an `AssocOp` from a `BinOpKind` in a bunch of places, and
`AssocOp::from_ast_binop` is removed.
`AssocOp::to_ast_binop` is also no longer needed.
Overall things are shorter and nicer.
`AssocOp::AssignOp` contains a `BinOpToken`. `ExprKind::AssignOp`
contains a `BinOpKind`. Given that `AssocOp` is basically a cut-down
version of `ExprKind`, it makes sense to make `AssocOp` more like
`ExprKind`. Especially given that `AssocOp` and `BinOpKind` use semantic
operation names (e.g. `Mul`, `Div`), but `BinOpToken` uses syntactic
names (e.g. `Star`, `Slash`).
This results in more concise code, and removes the need for various
conversions. (Note that the removed functions `hirbinop2assignop` and
`astbinop2assignop` are semantically identical, because `hir::BinOp` is
just a synonum for `ast::BinOp`!)
The only downside to this is that it allows the possibility of some
nonsensical combinations, such as `AssocOp::AssignOp(BinOpKind::Lt)`.
But `ExprKind::AssignOp` already has that problem. The problem can be
fixed for both types in the future with some effort, by introducing an
`AssignOpKind` type.
Precedence improvements: closures and jumps
This PR fixes some cases where rustc's pretty printers would redundantly parenthesize expressions that didn't need it.
<table>
<tr><th>Before</th><th>After</th></tr>
<tr><td><code>return (|x: i32| x)</code></td><td><code>return |x: i32| x</code></td></tr>
<tr><td><code>(|| -> &mut () { std::process::abort() }).clone()</code></td><td><code>|| -> &mut () { std::process::abort() }.clone()</code></td></tr>
<tr><td><code>(continue) + 1</code></td><td><code>continue + 1</code></td></tr>
</table>
Tested by `echo "fn main() { let _ = $AFTER; }" | rustc -Zunpretty=expanded /dev/stdin`.
The pretty-printer aims to render the syntax tree as it actually exists in rustc, as faithfully as possible, in Rust syntax. It can insert parentheses where forced by Rust's grammar in order to preserve the meaning of a macro-generated syntax tree, for example in the case of `a * $rhs` where $rhs is `b + c`. But for any expression parsed from source code, without a macro involved, there should never be a reason for inserting additional parentheses not present in the original.
For closures and jumps (return, break, continue, yield, do yeet, become) the unneeded parentheses came from the precedence of some of these expressions being misidentified. In the same order as the table above:
- Jumps and closures are supposed to have equal precedence. The [Rust Reference](https://doc.rust-lang.org/1.83.0/reference/expressions.html#expression-precedence) says so, and in Syn they do. There is no Rust syntax that would require making a precedence distinction between jumps and closures. But in rustc these were previously 2 distinct levels with the closure being lower, hence the parentheses around a closure inside a jump (but not a jump inside a closure).
- When a closure is written with an explicit return type, the grammar [requires](https://doc.rust-lang.org/1.83.0/reference/expressions/closure-expr.html) that the closure body consists of exactly one block expression, not any other arbitrary expression as usual for closures. Parsing of the closure body does not continue after the block expression. So in `|| { 0 }.clone()` the clone is inside the closure body and applies to `{ 0 }`, whereas in `|| -> _ { 0 }.clone()` the clone is outside and applies to the closure as a whole.
- Continue never needs parentheses. It was previously marked as having the lowest possible precedence but it should have been the highest, next to paths and loops and function calls, not next to jumps.
`rustc_span::symbol` defines some things that are re-exported from
`rustc_span`, such as `Symbol` and `sym`. But it doesn't re-export some
closely related things such as `Ident` and `kw`. So you can do `use
rustc_span::{Symbol, sym}` but you have to do `use
rustc_span::symbol::{Ident, kw}`, which is inconsistent for no good
reason.
This commit re-exports `Ident`, `kw`, and `MacroRulesNormalizedIdent`,
and changes many `rustc_span::symbol::` qualifiers in `compiler/` to
`rustc_span::`. This is a 200+ net line of code reduction, mostly
because many files with two `use rustc_span` items can be reduced to
one.
Rollup of 9 pull requests
Successful merges:
- #122670 (Fix bug where `option_env!` would return `None` when env var is present but not valid Unicode)
- #131095 (Use environment variables instead of command line arguments for merged doctests)
- #131339 (Expand set_ptr_value / with_metadata_of docs)
- #131652 (Move polarity into `PolyTraitRef` rather than storing it on the side)
- #131675 (Update lint message for ABI not supported)
- #131681 (Fix up-to-date checking for run-make tests)
- #131702 (Suppress import errors for traits that couldve applied for method lookup error)
- #131703 (Resolved python deprecation warning in publish_toolstate.py)
- #131710 (Remove `'apostrophes'` from `rustc_parse_format`)
r? `@ghost`
`@rustbot` modify labels: rollup
Add `&pin (mut|const) T` type position sugar
This adds parser support for `&pin mut T` and `&pin const T` references. These are desugared to `Pin<&mut T>` and `Pin<&T>` in the AST lowering phases.
This PR currently includes #130526 since that one is in the commit queue. Only the most recent commits (bd450027eb4a94b814a7dd9c0fa29102e6361149 and following) are new.
Tracking:
- #130494
r? `@compiler-errors`
Parenthesize break values containing leading label
The AST pretty printer previously produced invalid syntax in the case of `break` expressions with a value that begins with a loop or block label.
```rust
macro_rules! expr {
($e:expr) => {
$e
};
}
fn main() {
loop {
break expr!('a: loop { break 'a 1; } + 1);
};
}
```
`rustc -Zunpretty=expanded main.rs `:
```console
#![feature(prelude_import)]
#![no_std]
#[prelude_import]
use ::std::prelude::rust_2015::*;
#[macro_use]
extern crate std;
macro_rules! expr { ($e:expr) => { $e }; }
fn main() { loop { break 'a: loop { break 'a 1; } + 1; }; }
```
The expanded code is not valid Rust syntax. Printing invalid syntax is bad because it blocks `cargo expand` from being able to format the output as Rust syntax using rustfmt.
```console
error: parentheses are required around this expression to avoid confusion with a labeled break expression
--> <anon>:9:26
|
9 | fn main() { loop { break 'a: loop { break 'a 1; } + 1; }; }
| ^^^^^^^^^^^^^^^^^^^^^^^^
|
help: wrap the expression in parentheses
|
9 | fn main() { loop { break ('a: loop { break 'a 1; }) + 1; }; }
| + +
```
This PR updates the AST pretty-printer to insert parentheses around the value of a `break` expression as required to avoid this edge case.
This commit by itself is supposed to have no effect on behavior. All of
the call sites are updated to preserve their previous behavior.
The behavior changes are in the commits that follow.
For each of these, we need to decide whether they need to be using
`expr_requires_semi_to_be_stmt`, or `expr_requires_comma_to_be_match_arm`,
which are supposed to be 2 different behaviors. Previously they were
conflated into one, causing either too much or too little
parenthesization.
This mostly works well, and eliminates a couple of delayed bugs.
One annoying thing is that we should really also add an
`ErrorGuaranteed` to `proc_macro::bridge::LitKind::Err`. But that's
difficult because `proc_macro` doesn't have access to `ErrorGuaranteed`,
so we have to fake it.
`cook_lexer_literal` can emit an error about an invalid int literal but
then return a non-`Err` token. And then `integer_lit` has to account for
this to avoid printing a redundant error message.
This commit changes `cook_lexer_literal` to return `Err` in that case.
Then `integer_lit` doesn't need the special case, and
`LitError::LexerError` can be removed.
`unescape_literal` becomes `unescape_unicode`, and `unescape_c_string`
becomes `unescape_mixed`. Because rfc3349 will mean that C string
literals will no longer be the only mixed utf8 literals.
- Rename it as `MixedUnit`, because it will soon be used in more than
just C string literals.
- Change the `Byte` variant to `HighByte` and use it only for
`\x80`..`\xff` cases. This fixes the old inexactness where ASCII chars
could be encoded with either `Byte` or `Char`.
- Add useful comments.
- Remove `is_ascii`, in favour of `u8::is_ascii`.