Micro-optimization attempt in coroutine layout computation
In `compute_layout`, there were a bunch of collections (`IndexVec`s) that were being created by `push`ing in a loop, instead of a, hopefully, more performant usage of iterator combinators. [Second commit](6f682c2774) is just a small cleanup.
I'd love a perf run to see if this shows up in benchmarks.