Commit Graph

33 Commits

Author SHA1 Message Date
Pavel Grigorenko
afabc583f7 impl CloneToUninit for Path and OsStr 2024-07-29 20:44:39 +03:00
Nicholas Nethercote
84ac80f192 Reformat use declarations.
The previous commit updated `rustfmt.toml` appropriately. This commit is
the outcome of running `x fmt --all` with the new formatting options.
2024-07-29 08:26:52 +10:00
Jubilee Young
64fb2366da std: Unsafe-wrap in Wtf8 impl 2024-07-14 17:44:13 -07:00
ash
e5167fe7bd set self.is_known_utf8 to false in extend_from_slice 2024-06-25 23:58:43 -06:00
ash
aa46a3368e PathBuf::as_mut_vec removed and verified for UEFI and Windows platforms #126333 2024-06-25 07:36:34 -06:00
Jubilee Young
af04418a05 Make PathBuf less Ok with adding UTF-16 then into_string 2024-06-12 01:00:21 -07:00
schvv31n
fd5777c4c5 impl OsString::leak & PathBuf::leak 2024-06-04 11:53:59 +01:00
Ralf Jung
c47978a241 PathBuf: replace transmuting by accessor functions 2024-04-26 18:09:09 +02:00
Jan Verbeek
51a7396ad3 Move OsStr::slice_encoded_bytes validation to platform modules
On Windows and UEFI this improves performance and error messaging.

On other platforms we optimize the fast path a bit more.

This also prepares for later relaxing the checks on certain platforms.
2024-01-21 19:51:49 +01:00
Ralf Jung
f887f5a9c6 std: add some missing repr(transparent) 2023-08-14 10:40:59 +02:00
Ed Page
ee604fccd9 Allow limited access to OsString bytes
This extends #109698 to allow no-cost conversion between `Vec<u8>` and `OsString`
as suggested in feedback from `os_str_bytes` crate in #111544.
2023-07-07 09:46:48 -05:00
Matthias Krüger
4efdb5c001 Rollup merge of #98202 - aticu:impl_tryfrom_osstr_for_str, r=Amanieu
Implement `TryFrom<&OsStr>` for `&str`

Recently when trying to work with `&OsStr` I was surprised to find this `impl` missing.

Since the `to_str` method already existed the actual implementation is fairly non-controversial, except for maybe the choice of the error type. I chose an opaque error here instead of something like `std::str::Utf8Error`, since that would already make a number of assumption about the underlying implementation of `OsStr`.

As this is a trait implementation, it is insta-stable, if I'm not mistaken?
Either way this will need an FCP.
I chose "1.64.0" as the version, since this is unlikely to land before the beta cut-off.

`@rustbot` modify labels: +T-libs-api

API Change Proposal: rust-lang/rust#99031 (accepted)
2023-06-14 18:10:27 +02:00
aticu
e3a1a11ed2 Implement TryFrom<&OsStr> for &str 2023-06-12 10:46:49 +02:00
Ed Page
8d2beb50c2 Allow access to OsStr bytes
`OsStr` has historically kept its implementation details private out of
concern for locking us into a specific encoding on Windows.

This is an alternative to #95290 which proposed specifying the encoding on Windows.  Instead, this
only specifies that for cross-platform code, `OsStr`'s encoding is a superset of UTF-8 and defines
rules for safely interacting with it

At minimum, this can greatly simplify the `os_str_bytes` crate and every
arg parser that interacts with `OsStr` directly (which is most of those
that support invalid UTF-8).
2023-03-27 22:29:44 -05:00
Konrad Borowski
174c0e86ca Inline AsInner implementations 2023-05-01 13:25:09 +02:00
est31
999405059c Match unmatched backticks in library/ 2023-03-03 03:03:29 +01:00
Lukas Markeffsky
76e216f29b Use associated items of char instead of freestanding items in core::char 2023-01-14 11:58:41 +01:00
bors
25ea5a36c6 Auto merge of #96869 - sunfishcode:main, r=joshtriplett
Optimize `Wtf8Buf::into_string` for the case where it contains UTF-8.

Add a `is_known_utf8` flag to `Wtf8Buf`, which tracks whether the
string is known to contain UTF-8. This is efficiently computed in many
common situations, such as when a `Wtf8Buf` is constructed from a `String`
or `&str`, or with `Wtf8Buf::from_wide` which is already doing UTF-16
decoding and already checking for surrogates.

This makes `OsString::into_string` O(1) rather than O(N) on Windows in
common cases.

And, it eliminates the need to scan through the string for surrogates in
`Args::next` and `Vars::next`, because the strings are already being
translated with `Wtf8Buf::from_wide`.

Many things on Windows construct `OsString`s with `Wtf8Buf::from_wide`,
such as `DirEntry::file_name` and `fs::read_link`, so with this patch,
users of those functions can subsequently call `.into_string()` without
paying for an extra scan through the string for surrogates.

r? `@ghost`
2022-08-24 01:17:52 +00:00
YOSHIOKA Takuma
2bb7e1e6ed Guarantee try_reserve preserves the contents on error
Update doc comments to make the guarantee explicit. However, some
implementations does not have the statement though.

* `HashMap`, `HashSet`: require guarantees on hashbrown side.
* `PathBuf`: simply redirecting to `OsString`.

Fixes #99606.
2022-08-10 01:51:38 +09:00
Dan Gohman
a7d57c61c2 Don't eagerly scan for is_known_utf8 in to_ascii_lowercase/uppercase. 2022-06-23 13:10:47 -07:00
Dan Gohman
d3a585e593 Panic safety. 2022-06-23 13:10:47 -07:00
Dan Gohman
caf8bcceff Optimize Wtf8Buf::into_string for the case where it contains UTF-8.
Add a `is_known_utf8` flag to `Wtf8Buf`, which tracks whether the
string is known to contain UTF-8. This is efficiently computed in many
common situations, such as when a `Wtf8Buf` is constructed from a `String`
or `&str`, or with `Wtf8Buf::from_wide` which is already doing UTF-16
decoding and already checking for surrogates.

This makes `OsString::into_string` O(1) rather than O(N) on Windows in
common cases.

And, it eliminates the need to scan through the string for surrogates in
`Args::next` and `Vars::next`, because the strings are already being
translated with `Wtf8Buf::from_wide`.

Many things on Windows construct `OsString`s with `Wtf8Buf::from_wide`,
such as `DirEntry::file_name` and `fs::read_link`, so with this patch,
users of those functions can subsequently call `.into_string()` without
paying for an extra scan through the string for surrogates.
2022-06-23 13:10:47 -07:00
Mara Bos
4f212f08cf Use Rust 2021 prelude in std itself. 2022-05-09 11:12:32 +02:00
Aron Parker
fc6af819c4 Make EncodeWide implement FusedIterator 2022-04-25 18:38:47 +02:00
T-O-R-U-S
72a25d05bf Use implicit capture syntax in format_args
This updates the standard library's documentation to use the new syntax. The
documentation is worthwhile to update as it should be more idiomatic
(particularly for features like this, which are nice for users to get acquainted
with). The general codebase is likely more hassle than benefit to update: it'll
hurt git blame, and generally updates can be done by folks updating the code if
(and when) that makes things more readable with the new format.

A few places in the compiler and library code are updated (mostly just due to
already having been done when this commit was first authored).
2022-03-10 10:23:40 -05:00
Xuanwo
b07ae1c4d5 Address comments
Signed-off-by: Xuanwo <github@xuanwo.io>
2021-12-29 14:02:20 +08:00
Xuanwo
27b92c9f98 Implement support in wtf8
Signed-off-by: Xuanwo <github@xuanwo.io>
2021-12-28 11:53:14 +08:00
Eduardo Sánchez Muñoz
23637e20cd libcore: assume the input of next_code_point and next_code_point_reverse is UTF-8-like
The functions are now `unsafe` and they use `Option::unwrap_unchecked` instead of `unwrap_or_0`

`unwrap_or_0` was added in 42357d772b. I guess `unwrap_unchecked` was not available back then.

Given this example:

```rust
pub fn first_char(s: &str) -> Option<char> {
    s.chars().next()
}
```

Previously, the following assembly was produced:

```asm
_ZN7example10first_char17ha056ddea6bafad1cE:
	.cfi_startproc
	test	rsi, rsi
	je	.LBB0_1
	movzx	edx, byte ptr [rdi]
	test	dl, dl
	js	.LBB0_3
	mov	eax, edx
	ret
.LBB0_1:
	mov	eax, 1114112
	ret
.LBB0_3:
	lea	r8, [rdi + rsi]
	xor	eax, eax
	mov	r9, r8
	cmp	rsi, 1
	je	.LBB0_5
	movzx	eax, byte ptr [rdi + 1]
	add	rdi, 2
	and	eax, 63
	mov	r9, rdi
.LBB0_5:
	mov	ecx, edx
	and	ecx, 31
	cmp	dl, -33
	jbe	.LBB0_6
	cmp	r9, r8
	je	.LBB0_9
	movzx	esi, byte ptr [r9]
	add	r9, 1
	and	esi, 63
	shl	eax, 6
	or	eax, esi
	cmp	dl, -16
	jb	.LBB0_12
.LBB0_13:
	cmp	r9, r8
	je	.LBB0_14
	movzx	edx, byte ptr [r9]
	and	edx, 63
	jmp	.LBB0_16
.LBB0_6:
	shl	ecx, 6
	or	eax, ecx
	ret
.LBB0_9:
	xor	esi, esi
	mov	r9, r8
	shl	eax, 6
	or	eax, esi
	cmp	dl, -16
	jae	.LBB0_13
.LBB0_12:
	shl	ecx, 12
	or	eax, ecx
	ret
.LBB0_14:
	xor	edx, edx
.LBB0_16:
	and	ecx, 7
	shl	ecx, 18
	shl	eax, 6
	or	eax, ecx
	or	eax, edx
	ret
```

After this change, the assembly is reduced to:

```asm
_ZN7example10first_char17h4318683472f884ccE:
	.cfi_startproc
	test	rsi, rsi
	je	.LBB0_1
	movzx	ecx, byte ptr [rdi]
	test	cl, cl
	js	.LBB0_3
	mov	eax, ecx
	ret
.LBB0_1:
	mov	eax, 1114112
	ret
.LBB0_3:
	mov	eax, ecx
	and	eax, 31
	movzx	esi, byte ptr [rdi + 1]
	and	esi, 63
	cmp	cl, -33
	jbe	.LBB0_4
	movzx	edx, byte ptr [rdi + 2]
	shl	esi, 6
	and	edx, 63
	or	edx, esi
	cmp	cl, -16
	jb	.LBB0_7
	movzx	ecx, byte ptr [rdi + 3]
	and	eax, 7
	shl	eax, 18
	shl	edx, 6
	and	ecx, 63
	or	ecx, edx
	or	eax, ecx
	ret
.LBB0_4:
	shl	eax, 6
	or	eax, esi
	ret
.LBB0_7:
	shl	eax, 12
	or	eax, edx
	ret
```
2021-11-21 17:05:55 +01:00
Noah Lev
865d99f82b docs: Escape brackets to satisfy the linkchecker
My change to use `Type::def_id()` (formerly `Type::def_id_full()`) in
more places caused some docs to show up that used to be missed by
rustdoc. Those docs contained unescaped square brackets, which triggered
linkcheck errors. This commit escapes the square brackets and adds this
particular instance to the linkcheck exception list.
2021-10-22 14:08:43 -07:00
Frank Steffahn
2f9ddf3bc7 Fix typos “an”→“a” and a few different ones that appeared in the same search 2021-08-22 18:15:49 +02:00
Deadbeef
15cdb28f5b Account for self.extra in size_hint for EncodeWide 2021-06-19 12:59:22 +08:00
Lzu Tao
a4e926daee std: move "mod tests/benches" to separate files
Also doing fmt inplace as requested.
2020-08-31 02:56:59 +00:00
mark
2c31b45ae8 mv std libs to library/ 2020-07-27 19:51:13 -05:00