This'll still go via slices eventually for large arrays, but this way slice comparisons to short arrays can use the same memcmp-avoidance tricks.
Added some tests for all the combinations to make sure I didn't accidentally infinitely-recurse something.