std: Get the standard library compiling for wasm64
This commit goes through and updates various `#[cfg]` as appropriate to
get the wasm64-unknown-unknown target behaving similarly to the
wasm32-unknown-unknown target. Most of this is just updating various
conditions for `target_arch = "wasm32"` to also account for `target_arch
= "wasm64"` where appropriate. This commit also lists `wasm64` as an
allow-listed architecture to not have the `restricted_std` feature
enabled, enabling experimentation with `-Z build-std` externally.
The main goal of this commit is to enable playing around with
`wasm64-unknown-unknown` externally via `-Z build-std` in a way that's
similar to the `wasm32-unknown-unknown` target. These targets are
effectively the same and only differ in their pointer size, but wasm64
is much newer and has much less ecosystem/library support so it'll still
take time to get wasm64 fully-fledged.
This makes a few changes to the weak symbol macros in `sys::unix`:
- `dlsym!` is added to keep the functionality for runtime `dlsym`
lookups, like for `__pthread_get_minstack@GLIBC_PRIVATE` that we don't
want to show up in ELF symbol tables.
- `weak!` now uses `#[linkage = "extern_weak"]` symbols, so its runtime
behavior is just a simple null check. This is also used by `syscall!`.
- On non-ELF targets (macos/ios) where that linkage is not known to
behave, `weak!` is just an alias to `dlsym!` for the old behavior.
- `raw_syscall!` is added to always call `libc::syscall` on linux and
android, for cases like `clone3` that have no known libc wrapper.
The new `weak!` linkage does mean that you'll get versioned symbols if
you build with a newer glibc, like `WEAK DEFAULT UND statx@GLIBC_2.28`.
This might seem problematic, but old non-weak symbols can tie the build
to new versions too, like `dlsym@GLIBC_2.34` from their recent library
unification. If you build with an old glibc like `dist-x86_64-linux`
does, you'll still get unversioned `WEAK DEFAULT UND statx`, which may
be resolved based on the runtime glibc.
I also found a few functions that don't need to be weak anymore:
- Android can directly use `ftruncate64`, `pread64`, and `pwrite64`, as
these were added in API 12, and our baseline is API 14.
- Linux can directly use `splice`, added way back in glibc 2.5 and
similarly old musl. Android only added it in API 21 though.
Only use `clone3` when needed for pidfd
In #89522 we learned that `clone3` is interacting poorly with Gentoo's
`sandbox` tool. We only need that for the unstable pidfd extensions, so
otherwise avoid that and use a normal `fork`.
This is a re-application of beta #89924, now that we're aware that we need
more than just a temporary release fix. I also reverted 12fbabd27f, as
that was just fallout from using `clone3` instead of `fork`.
r? `@Mark-Simulacrum`
cc `@joshtriplett`
This commit goes through and updates various `#[cfg]` as appropriate to
get the wasm64-unknown-unknown target behaving similarly to the
wasm32-unknown-unknown target. Most of this is just updating various
conditions for `target_arch = "wasm32"` to also account for `target_arch
= "wasm64"` where appropriate. This commit also lists `wasm64` as an
allow-listed architecture to not have the `restricted_std` feature
enabled, enabling experimentation with `-Z build-std` externally.
The main goal of this commit is to enable playing around with
`wasm64-unknown-unknown` externally via `-Z build-std` in a way that's
similar to the `wasm32-unknown-unknown` target. These targets are
effectively the same and only differ in their pointer size, but wasm64
is much newer and has much less ecosystem/library support so it'll still
take time to get wasm64 fully-fledged.
Make `std:🧵:available_concurrency` support process-limited number of CPUs
Use `libc::sched_getaffinity` and count the number of CPUs in the returned mask. This handles cases where the process doesn't have access to all CPUs, such as when limited via `taskset` or similar.
This also covers cgroup cpusets.
In #89522 we learned that `clone3` is interacting poorly with Gentoo's
`sandbox` tool. We only need that for the unstable pidfd extensions, so
otherwise avoid that and use a normal `fork`.
`#[thread_local]` allows us to maintain a per-thread list of destructors. This also avoids the need to synchronize global data (which is particularly tricky within the TLS callback function).
Use libc::sched_getaffinity and count the number of CPUs in the returned
mask. This handles cases where the process doesn't have access to all
CPUs, such as when limited via taskset or similar.
Automatically convert paths to verbatim for filesystem operations that support it
This allows using longer paths without the user needing to `canonicalize` or manually prefix paths. If the path is already verbatim then this has no effect.
Fixes: #32689
As a security measure, Windows 11 introduces a new temporary directory API, GetTempPath2.
When the calling process is running as SYSTEM, a separate temporary directory
will be returned inaccessible to non-SYSTEM processes. For non-SYSTEM processes
the behavior will be the same as before.
removing TLS support in x86_64-unknown-none-hermitkernel
HermitCore's kernel itself doesn't support TLS. Consequently, the entries in x86_64-unknown-none-hermitkernel should be removed. This commit should help to finalize #89062.
linux/aarch64 Now() should be actually_monotonic()
While issues have been seen on arm64 platforms the Arm architecture requires
that the counter monotonically increases and that it must provide a uniform
view of system time (e.g. it must not be possible for a core to receive a
message from another core with a time stamp and observe time going backwards
(ARM DDI 0487G.b D11.1.2). While there have been a few 64bit SoCs that have
bugs (#49281, #56940) which cause time to not monotonically increase, these have
been fixed in the Linux kernel and we shouldn't penalize all Arm SoCs for those
who refuse to update their kernels:
SUN50I_ERRATUM_UNKNOWN1 - Allwinner A64 / Pine A64 - fixed in 5.1
FSL_ERRATUM_A008585 - Freescale LS2080A/LS1043A - fixed in 4.10
HISILICON_ERRATUM_161010101 - Hisilicon 1610 - fixed in 4.11
ARM64_ERRATUM_858921 - Cortex A73 - fixed in 4.12
255a3f3e18 std: Force `Instant::now()` to be monotonic added a Mutex to work around
this problem and a small test program using glommio shows the majority of time spent
acquiring and releasing this Mutex. 3914a7b0da tries to improve this, but actually
makes it worse on big systems as for 128b atomics a ldxp/stxp pair (and successful loop)
for v8.4 systems that don't support FEAT_LSE2 is required which is expensive as a lock
and because of how the load/store-exclusives scale on large Arm systems is both unfair
to threads and tends to go backwards in performance.
A small sample program using glommio improves by 70x on a 32 core Graviton2
system with this change.
[fuchsia] Update process info struct
The fuchsia platform is in the process of softly transitioning over to
using a new value for ZX_INFO_PROCESS with a new corresponding struct.
This change migrates libstd.
See [fxrev.dev/510478](https://fxrev.dev/510478) and [fxbug.dev/30751](https://fxbug.dev/30751) for more detail.
Use BCryptGenRandom instead of RtlGenRandom on Windows.
This removes usage of RtlGenRandom on Windows, in favour of BCryptGenRandom.
BCryptGenRandom isn't available on XP, but we dropped XP support a while ago.
The fuchsia platform is in the process of softly transitioning over to
using a new value for ZX_INFO_PROCESS with a new corresponding struct.
This change migrates libstd.
See fxrev.dev/510478 and fxbug.dev/30751 for more detail.
Fix ctrl-c causing reads of stdin to return empty on Windows.
Pressing ctrl+c (or ctrl+break) on Windows caused a blocking read of stdin to unblock and return empty, unlike other platforms which continue to block.
On ctrl-c, `ReadConsoleW` will return success, but also set `LastError` to `ERROR_OPERATION_ABORTED`.
This change detects this case, and re-tries the call to `ReadConsoleW`.
Fixes#89177. See issue for further details.
Tested on Windows 7 and Windows 10 with both MSVC and GNU toolchains
Add new tier-3 target: armv7-unknown-linux-uclibceabihf
This change adds a new tier-3 target: armv7-unknown-linux-uclibceabihf
This target is primarily used in embedded linux devices where system resources are slim and glibc is deemed too heavyweight. Cross compilation C toolchains are available [here](https://toolchains.bootlin.com/) or via [buildroot](https://buildroot.org).
The change is based largely on a previous PR #79380 with a few minor modifications. The author of that PR was unable to push the PR forward, and graciously allowed me to take it over.
Per the [target tier 3 policy](https://github.com/rust-lang/rfcs/blob/master/text/2803-target-tier-policy.md), I volunteer to be the "target maintainer".
This is my first PR to Rust itself, so I apologize if I've missed things!
Rename `std:🧵:available_conccurrency` to `std:🧵:available_parallelism`
_Tracking issue: https://github.com/rust-lang/rust/issues/74479_
This PR renames `std:🧵:available_conccurrency` to `std:🧵:available_parallelism`.
## Rationale
The API was initially named `std:🧵:hardware_concurrency`, mirroring the [C++ API of the same name](https://en.cppreference.com/w/cpp/thread/thread/hardware_concurrency). We eventually decided to omit any reference to the word "hardware" after [this comment](https://github.com/rust-lang/rust/pull/74480#issuecomment-662045841). And so we ended up with `available_concurrency` instead.
---
For a talk I was preparing this week I was reading through ["Understanding and expressing scalable concurrency" (A. Turon, 2013)](http://aturon.github.io/academic/turon-thesis.pdf), and the following passage stood out to me (emphasis mine):
> __Concurrency is a system-structuring mechanism.__ An interactive system that deals with disparate asynchronous events is naturally structured by division into concurrent threads with disparate responsibilities. Doing so creates a better fit between problem and solution, and can also decrease the average latency of the system by preventing long-running computations from obstructing quicker ones.
> __Parallelism is a resource.__ A given machine provides a certain capacity for parallelism, i.e., a bound on the number of computations it can perform simultaneously. The goal is to maximize throughput by intelligently using this resource. For interactive systems, parallelism can decrease latency as well.
_Chapter 2.1: Concurrency is not Parallelism. Page 30._
---
_"Concurrency is a system-structuring mechanism. Parallelism is a resource."_ — It feels like this accurately captures the way we should be thinking about these APIs. What this API returns is not "the amount of concurrency available to the program" which is a property of the program, and thus even with just a single thread is effectively unbounded. But instead it returns "the amount of _parallelism_ available to the program", which is a resource hard-constrained by the machine's capacity (and can be further restricted by e.g. operating systems).
That's why I'd like to propose we rename this API from `available_concurrency` to `available_parallelism`. This still meets the criteria we previously established of not attempting to define what exactly we mean by "hardware", "threads", and other such words. Instead we only talk about "concurrency" as an abstract resource available to our program.
r? `@joshtriplett`