A successful advance is now signalled by returning `0` and other values now represent the remaining number
of steps that couldn't be advanced as opposed to the amount of steps that have been advanced during a partial advance_by.
This simplifies adapters a bit, replacing some `match`/`if` with arithmetic. Whether this is beneficial overall depends
on whether `advance_by` is mostly used as a building-block for other iterator methods and adapters or whether
we also see uses by users where `Result` might be more useful.
`.into_iter()` on arrays was slower than it needed to be (especially compared to slice iterator) since it uses `Range<usize>`, which needs to handle degenerate ranges like `10..4`.
This PR adds an internal `IndexRange` type that's like `Range<usize>` but with a safety invariant that means it doesn't need to worry about those cases -- it only handles `start <= end` -- and thus can give LLVM more information to optimize better.
I added one simple demonstration of the improvement as a codegen test.