When collecting tokens there are two kinds of range:
- a range relative to the parser's full token stream (which we get when
we are parsing);
- a range relative to a single AST node's token stream (which we use
within `LazyAttrTokenStreamImpl` when replacing tokens).
These are currently both represented with `Range<u32>` and it's easy to
mix them up -- until now I hadn't properly understood the difference.
This commit introduces `ParserRange` and `NodeRange` to distinguish
them. This also requires splitting `ReplaceRange` in two, giving the new
types `ParserReplacement` and `NodeReplacement`. (These latter two names
reduce the overloading of the word "range".)
The commit also rewrites some comments to be clearer.
The end result is a little more verbose, but much clearer.
Remove unnecessary range replacements
This PR removes an unnecessary range replacement in `collect_tokens_trailing_token`, and does a couple of other small cleanups.
r? ````@petrochenkov````
A fully imperative style is easier to read than a half-iterator,
half-imperative style. Also, rename `inner_attr` as `attr` because it
might be an outer attribute.
Imagine you have replace ranges (2..20,X) and (5..15,Y), and these tokens:
```
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x
```
If we replace (5..15,Y) first, then (2..20,X) we get this sequence
```
a,b,c,d,e,Y,_,_,_,_,_,_,_,_,_,p,q,r,s,t,u,v,w,x
a,b,X,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,_,u,v,w,x
```
which is what we want.
If we do it in the other order, we get this:
```
a,b,X,_,_,_,_,_,_,_,_,_,_,_,_,p,q,r,s,t,u,v,w,x
a,b,X,_,_,Y,_,_,_,_,_,_,_,_,_,_,_,_,_,_,u,v,w,x
```
which is wrong. So it's true that we need the `.rev()` but the comment
is wrong about why.
The current code is this:
```
self.capture_state.replace_ranges.push((start_pos..end_pos, Some(target)));
self.capture_state.replace_ranges.extend(inner_attr_replace_ranges);
```
What's not obvious is that every range in `inner_attr_replace_ranges`
must be a strict sub-range of `start_pos..end_pos`. Which means, in
`LazyAttrTokenStreamImpl::to_attr_token_stream`, they will be done
first, and then the `start_pos..end_pos` replacement will just overwrite
them. So they aren't needed.
This has been bugging me for a while. I find complex "if any of these
are true" conditions easier to think about than complex "if all of these
are true" conditions, because you can stop as soon as one is true.
Adding details, clarifying lots of little things, etc. In particular,
the commit adds details of an example. I find this very helpful, because
it's taken me a long time to understand how this code works.
The `Option`s within the `ReplaceRange`s within the hashmap are always
`None`. This PR omits them and inserts them when they are extracted from
the hashmap.
It's used in `Parser::collect_tokens_trailing_token` to decide whether
to capture a trailing token. But the callers actually know whether to
capture a trailing token, so it's simpler for them to just pass in a
bool.
Also, the `TrailingToken::Gt` case was weird, because it didn't result
in a trailing token being captured. It could have been subsumed by the
`TrailingToken::MaybeComma` case, and it effectively is in the new code.
The new condition is equivalent in practice, but it's much more obvious
that it would result in an empty range, because the condition lines up
with the contents of the iterator.
There's a comment saying we don't do it for performance reasons, but it
doesn't actually affect performance.
The commit also tweaks the control flow, to make clearer that two code
paths are mutually exclusive.
Currently the second element is a `Vec<(FlatToken, Spacing)>`. But the
vector always has zero or one elements, and the `FlatToken` is always
`FlatToken::AttrTarget` (which contains an `AttributesData`), and the
spacing is always `Alone`. So we can simplify it to
`Option<AttributesData>`.
An assertion in `to_attr_token_stream` can can also be removed, because
`new_tokens.len()` was always 0 or 1, which means than `range.len()`
is always greater than or equal to it, because `range.is_empty()` is
always false (as per the earlier assertion).
And update the comment. Clearly the return type of this function was
changed at some point in the past, but its name and comment weren't
updated to match.
The number of source code bytes can't exceed a `u32`'s range, so a token
position also can't. This reduces the size of `Parser` and
`LazyAttrTokenStreamImpl` by eight bytes each.
Fix duplicated attributes on nonterminal expressions
This PR fixes a long-standing bug (#86055) whereby expression attributes can be duplicated when expanded through declarative macros.
First, consider how items are parsed in declarative macros:
```
Items:
- parse_nonterminal
- parse_item(ForceCollect::Yes)
- parse_item_
- attrs = parse_outer_attributes
- parse_item_common(attrs)
- maybe_whole!
- collect_tokens_trailing_token
```
The important thing is that the parsing of outer attributes is outside token collection, so the item's tokens don't include the attributes. This is how it's supposed to be.
Now consider how expression are parsed in declarative macros:
```
Exprs:
- parse_nonterminal
- parse_expr_force_collect
- collect_tokens_no_attrs
- collect_tokens_trailing_token
- parse_expr
- parse_expr_res(None)
- parse_expr_assoc_with
- parse_expr_prefix
- parse_or_use_outer_attributes
- parse_expr_dot_or_call
```
The important thing is that the parsing of outer attributes is inside token collection, so the the expr's tokens do include the attributes, i.e. in `AttributesData::tokens`.
This PR fixes the bug by rearranging expression parsing to that outer attribute parsing happens outside of token collection. This requires a number of small refactorings because expression parsing is somewhat complicated. While doing so the PR makes the code a bit cleaner and simpler, by eliminating `parse_or_use_outer_attributes` and `Option<AttrWrapper>` arguments (in favour of the simpler `parse_outer_attributes` and `AttrWrapper` arguments), and simplifying `LhsExpr`.
r? `@petrochenkov`
Existing names for values of this type are `sess`, `parse_sess`,
`parse_session`, and `ps`. `sess` is particularly annoying because
that's also used for `Session` values, which are often co-located, and
it can be difficult to know which type a value named `sess` refers to.
(That annoyance is the main motivation for this change.) `psess` is nice
and short, which is good for a name used this much.
The commit also renames some `parse_sess_created` values as
`psess_created`.
This is an extension of the previous commit. It means the output of
something like this:
```
stringify!(let a: Vec<u32> = vec![];)
```
goes from this:
```
let a: Vec<u32> = vec![] ;
```
With this PR, it now produces this string:
```
let a: Vec<u32> = vec![];
```
It uses `once` chained with `(0..self.num_calls).map(...)` followed by
`.take(self.num_calls`. I found this hard to read. It's simpler to just
use `repeat_with`.