Exposes the swapping logic from PR 40454 as `pub unsafe fn ptr::swap_nonoverlapping` under feature swap_nonoverlapping
This is most helpful for compound types where LLVM didn't vectorize the loop. Highlight: bench slice::rotate_medium_by727_strings gets 38% faster.
A helpful primitive for moving chunks of data around inside a slice.
In particular, adding elements to the end of a Vec then moving them
somewhere else, as a way to do efficient multiple-insert. (There's
drain for efficient block-remove, but no easy way to block-insert.)
Talk with another example: <https://youtu.be/qH6sSOr-yk8?t=560>