Mark `-1` as an available niche for file descriptors
Based on discussion from <https://internals.rust-lang.org/t/can-the-standard-library-shrink-option-file/12768>, the file descriptor `-1` is chosen based on the POSIX API designs that use it as a sentinel to report errors. A bigger niche could've been chosen, particularly on Linux, but would not necessarily be portable.
This PR also adds a test case to ensure that the -1 niche (which is kind of hacky and has no obvious test case) works correctly. It requires the "upper" bound, which is actually -1, to be expressed in two's complement.
Refactor and fix `parse_prefix` on Windows
This PR is an extension of #78692 as well as a general refactor of `parse_prefix`:
**Fixes**:
There are two errors in the current implementation of `parse_prefix`:
Firstly, in the current implementation only `\` is recognized as a separator character in device namespace prefixes. This behavior is only correct for verbatim paths; `"\\.\C:/foo"` should be parsed as `"C:"` instead of `"C:/foo"`.
Secondly, the current implementation only handles single separator characters. In non-verbatim paths a series of separator characters should be recognized as a single boundary, e.g. the UNC path `"\\localhost\\\\\\C$\foo"` should be parsed as `"\\localhost\\\\\\C$"` and then `UNC(server: "localhost", share: "C$")`, but currently it is not parsed at all, because it starts being parsed as `\\localhost\` and then has an invalid empty share location.
Paths like `"\\.\C:/foo"` and `"\\localhost\\\\\\C$\foo"` are valid on Windows, they are equivalent to just `"C:\foo"`.
**Refactoring**:
All uses of `&[u8]` within `parse_prefix` are extracted to helper functions and`&OsStr` is used instead. This reduces the number of places unsafe is used:
- `get_first_two_components` is adapted to the more general `parse_next_component` and used in more places
- code for parsing drive prefixes is extracted to `parse_drive`
Add fast futex-based thread parker for Windows.
This adds a fast futex-based thread parker for Windows. It either uses WaitOnAddress+WakeByAddressSingle or NT Keyed Events (NtWaitForKeyedEvent+NtReleaseKeyedEvent), depending on which is available. Together, this makes this thread parker work for Windows XP and up. Before this change, park()/unpark() did not work on Windows XP: it needs condition variables, which only exist since Windows Vista.
---
Unfortunately, NT Keyed Events are an undocumented Windows API. However:
- This API is relatively simple with obvious behaviour, and there are several (unofficial) articles documenting the details. [1]
- parking_lot has been using this API for years (on Windows versions before Windows 8). [2] Many big projects extensively use parking_lot, such as servo and the Rust compiler itself.
- It is the underlying API used by Windows SRW locks and Windows critical sections. [3] [4]
- The source code of the implementations of Wine, ReactOs, and Windows XP are available and match the expected behaviour.
- The main risk with an undocumented API is that it might change in the future. But since we only use it for older versions of Windows, that's not a problem.
- Even if these functions do not block or wake as we expect (which is unlikely, see all previous points), this implementation would still be memory safe. The NT Keyed Events API is only used to sleep/block in the right place.
[1]\: http://www.locklessinc.com/articles/keyed_events/
[2]\: https://github.com/Amanieu/parking_lot/commit/43abbc964e
[3]\: https://docs.microsoft.com/en-us/archive/msdn-magazine/2012/november/windows-with-c-the-evolution-of-synchronization-in-windows-and-c
[4]\: Windows Internals, Part 1, ISBN 9780735671300
---
The choice of fallback API is inspired by parking_lot(_core), but the implementation of this thread parker is different. While parking_lot has no use for a fast path (park() directly returning if unpark() was already called), this implementation has a fast path that returns without even checking which waiting/waking API to use, as the same atomic variable with compatible states is used in all cases.
Windows TLS: ManuallyDrop instead of mem::forget
The Windows TLS implementation still used `mem::forget` instead of `ManuallyDrop`, leading to the usual problem of "using" the `Box` when it should not be used any more.
Make the kernel_copy tests more robust/concurrent.
These tests write to the same filenames in /tmp and in some cases these files don't get cleaned up properly. This caused issues for us when different users run the tests on the same system, e.g.:
```
---- sys::unix::kernel_copy::tests::bench_file_to_file_copy stdout ----
thread 'sys::unix::kernel_copy::tests::bench_file_to_file_copy' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 13, kind: PermissionDenied, message: "Permission denied" }', library/std/src/sys/unix/kernel_copy/tests.rs:71:10
---- sys::unix::kernel_copy::tests::bench_file_to_socket_copy stdout ----
thread 'sys::unix::kernel_copy::tests::bench_file_to_socket_copy' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 13, kind: PermissionDenied, message: "Permission denied" }', library/std/src/sys/unix/kernel_copy/tests.rs💯10
```
Use `std::sys_common::io__test::tmpdir()` to solve this.
CC ``@the8472.``
Based on discussion from https://internals.rust-lang.org/t/can-the-standard-library-shrink-option-file/12768,
the file descriptor -1 is chosen based on the POSIX API designs that use it as a sentinel to report errors.
A bigger niche could've been chosen, particularly on Linux, but would not necessarily be portable.
This PR also adds a test case to ensure that the -1 niche
(which is kind of hacky and has no obvious test case) works correctly.
It requires the "upper" bound, which is actually -1, to be expressed in two's complement.
implement better availability probing for copy_file_range
Followup to https://github.com/rust-lang/rust/pull/75428#discussion_r469616547
Previously syscall detection was overly pessimistic. Any attempt to copy to an immutable file (EPERM) would disable copy_file_range support for the whole process.
The change tries to copy_file_range on invalid file descriptors which will never run into the immutable file case and thus we can clearly distinguish syscall availability.
ext/ucred: Support PID in peer creds on macOS
This is a follow-up to https://github.com/rust-lang/rust/pull/75148 (RFC: https://github.com/rust-lang/rust/issues/42839).
The original PR used `getpeereid` on macOS and the BSDs, since they don't (generally) support the `SO_PEERCRED` mechanism that Linux supplies.
This PR splits the macOS/iOS implementation of `peer_cred()` from that of the BSDs, since macOS supplies the `LOCAL_PEERPID` sockopt as a source of the missing PID. It also adds a `cfg`-gated tests that ensures that platforms with support for PIDs in `UCred` have the expected data.
These tests write to the same filenames in /tmp and in some cases these
files don't get cleaned up properly. This caused issues for us when
different users run the tests on the same system, e.g.:
```
---- sys::unix::kernel_copy::tests::bench_file_to_file_copy stdout ----
thread 'sys::unix::kernel_copy::tests::bench_file_to_file_copy' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 13, kind: PermissionDenied, message: "Permission denied" }', library/std/src/sys/unix/kernel_copy/tests.rs:71:10
---- sys::unix::kernel_copy::tests::bench_file_to_socket_copy stdout ----
thread 'sys::unix::kernel_copy::tests::bench_file_to_socket_copy' panicked at 'called `Result::unwrap()` on an `Err` value: Os { code: 13, kind: PermissionDenied, message: "Permission denied" }', library/std/src/sys/unix/kernel_copy/tests.rs💯10
```
Use `std::sys_common::io__test::tmpdir()` to solve this.
unix: Extend UnixStream and UnixDatagram to send and receive file descriptors
Add the functions `recv_vectored_fds` and `send_vectored_fds` to `UnixDatagram` and `UnixStream`. With this functions `UnixDatagram` and `UnixStream` can send and receive file descriptors, by using `recvmsg` and `sendmsg` system call.
unix/weak: pass arguments to syscall at the given type
Given that we know the type the argument should have, it seems a bit strange not to use that information.
r? `@m-ou-se` `@cuviper`
Tighten the bounds on atomic Ordering in std::sys::unix::weak::Weak
This moves reading this from multiple SeqCst reads to Relaxed read + Acquire fence if we are actually going to use the data.
Would love to avoid the Acquire fence, but doing so would need Ordering::Consume, which neither Rust, nor LLVM supports (a shame, since this fence is hardly free on ARM, which is what I was hoping to improve).
r? ``@Amanieu`` (Sorry for always picking you, but I know a lot of people wouldn't feel comfortable reviewing atomic ordering changes)
linux: try to use libc getrandom to allow interposition
We'll try to use a weak `getrandom` symbol first, because that allows
things like `LD_PRELOAD` interposition. For example, perf measurements
might want to disable randomness to get reproducible results. If the
weak symbol is not found, we fall back to a raw `SYS_getrandom` call.
Rollup of 9 pull requests
Successful merges:
- #77939 (Ensure that the source code display is working with DOS backline)
- #78138 (Upgrade dlmalloc to version 0.2)
- #78967 (Make codegen tests compatible with extra inlining)
- #79027 (Limit storage duration of inlined always live locals)
- #79077 (document that __rust_alloc is also magic to our LLVM fork)
- #79088 (clarify `span_label` documentation)
- #79097 (Code block invalid html tag lint)
- #79105 (std: Fix test `symlink_hard_link` on Windows)
- #79107 (build-manifest: strip newline from rustc version)
Failed merges:
r? `@ghost`
`@rustbot` modify labels: rollup
Upgrade dlmalloc to version 0.2
In preparation of adding dynamic memory management support for SGXv2-enabled platforms, the dlmalloc crate has been refactored. More specifically, support has been added to implement platform specification outside of the dlmalloc crate. (see https://github.com/alexcrichton/dlmalloc-rs/pull/15)
This PR upgrades dlmalloc to version 0.2 for the `wasm` and `sgx` targets.
As the dlmalloc changes have received a positive review, but have not been merged yet, this PR contains a commit to prevent tidy from aborting CI prematurely.
cc: `@jethrogb`
Make the libstd build script smaller
Of all sysroot crates currently only compiler_builtins, miniz_oxide and std require a build script. compiler_builtins uses to conditionally enable certain features and possibly compile a C version ([source](63ccaf11f0/build.rs)), miniz_oxide only uses it to detect if liballoc is supported as the MSRV is 1.34.0 instead of the 1.36.0 which stabilized liballoc ([source](28514ec09f/miniz_oxide/build.rs)). std now only uses it to enable `freebsd12` when the `RUST_STD_FREEBSD_12_ABI` env var is set, to determine if `restricted-std` should be set, to set the `STD_ENV_ARCH` env var identical to `CARGO_CFG_TARGET_ARCH`, and to unconditionally enable `backtrace_in_libstd`.
If all build scripts were to be removed, it would be possible for rustc to completely compile it's own sysroot. It currently requires a rustc version that already has an available libstd to compile the build scripts. If rustc can completely compile it's own sysroot, rustbuild could be simplified to not forcefully use the bootstrap compiler for build scripts.
`@rustbot` modify labels: +T-compiler +libs-impl
We'll try to use a weak `getrandom` symbol first, because that allows
things like `LD_PRELOAD` interposition. For example, perf measurements
might want to disable randomness to get reproducible results. If the
weak symbol is not found, we fall back to a raw `SYS_getrandom` call.
Fix an intrinsic invocation on threaded wasm
This looks like it was forgotten to get updated in #74482 and wasm with
threads isn't built on CI so we didn't catch this by accident.
specialize io::copy to use copy_file_range, splice or sendfile
Fixes#74426.
Also covers #60689 but only as an optimization instead of an official API.
The specialization only covers std-owned structs so it should avoid the problems with #71091
Currently linux-only but it should be generalizable to other unix systems that have sendfile/sosplice and similar.
There is a bit of optimization potential around the syscall count. Right now it may end up doing more syscalls than the naive copy loop when doing short (<8KiB) copies between file descriptors.
The test case executes the following:
```
[pid 103776] statx(3, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_ALL, {stx_mask=STATX_ALL|STATX_MNT_ID, stx_attributes=0, stx_mode=S_IFREG|0644, stx_size=17, ...}) = 0
[pid 103776] write(4, "wxyz", 4) = 4
[pid 103776] write(4, "iklmn", 5) = 5
[pid 103776] copy_file_range(3, NULL, 4, NULL, 5, 0) = 5
```
0-1 `stat` calls to identify the source file type. 0 if the type can be inferred from the struct from which the FD was extracted
𝖬 `write` to drain the `BufReader`/`BufWriter` wrappers. only happen when buffers are present. 𝖬 ≾ number of wrappers present. If there is a write buffer it may absorb the read buffer contents first so only result in a single write. Vectored writes would also be an option but that would require more invasive changes to `BufWriter`.
𝖭 `copy_file_range`/`splice`/`sendfile` until file size, EOF or the byte limit from `Take` is reached. This should generally be *much* more efficient than the read-write loop and also have other benefits such as DMA offload or extent sharing.
## Benchmarks
```
OLD
test io::tests::bench_file_to_file_copy ... bench: 21,002 ns/iter (+/- 750) = 6240 MB/s [ext4]
test io::tests::bench_file_to_file_copy ... bench: 35,704 ns/iter (+/- 1,108) = 3671 MB/s [btrfs]
test io::tests::bench_file_to_socket_copy ... bench: 57,002 ns/iter (+/- 4,205) = 2299 MB/s
test io::tests::bench_socket_pipe_socket_copy ... bench: 142,640 ns/iter (+/- 77,851) = 918 MB/s
NEW
test io::tests::bench_file_to_file_copy ... bench: 14,745 ns/iter (+/- 519) = 8889 MB/s [ext4]
test io::tests::bench_file_to_file_copy ... bench: 6,128 ns/iter (+/- 227) = 21389 MB/s [btrfs]
test io::tests::bench_file_to_socket_copy ... bench: 13,767 ns/iter (+/- 3,767) = 9520 MB/s
test io::tests::bench_socket_pipe_socket_copy ... bench: 26,471 ns/iter (+/- 6,412) = 4951 MB/s
```
Previously EOVERFLOW handling was only applied for io::copy specialization
but not for fs::copy sharing the same code.
Additionally we lower the chunk size to 1GB since we have a user report
that older kernels may return EINVAL when passing 0x8000_0000
but smaller values succeed.
Android builds use feature level 14, the libc wrapper for splice is gated
on feature level 21+ so we have to invoke the syscall directly.
Additionally the emulator doesn't seem to support it so we also have to
add ENOSYS checks.