Commit Graph

25 Commits

Author SHA1 Message Date
kwantam
5d4238b6fc Add libunicode; move unicode functions from core
- created new crate, libunicode, below libstd
- split Char trait into Char (libcore) and UnicodeChar (libunicode)
  - Unicode-aware functions now live in libunicode
    - is_alphabetic, is_XID_start, is_XID_continue, is_lowercase,
      is_uppercase, is_whitespace, is_alphanumeric, is_control,
      is_digit, to_uppercase, to_lowercase
  - added width method in UnicodeChar trait
    - determines printed width of character in columns, or None if it is
      a non-NULL control character
    - takes a boolean argument indicating whether the present context is
      CJK or not (characters with 'A'mbiguous widths are double-wide in
      CJK contexts, single-wide otherwise)
- split StrSlice into StrSlice (libcore) and UnicodeStrSlice
  (libunicode)
  - functionality formerly in StrSlice that relied upon Unicode
    functionality from Char is now in UnicodeStrSlice
    - words, is_whitespace, is_alphanumeric, trim, trim_left, trim_right
  - also moved Words type alias into libunicode because words method is
    in UnicodeStrSlice
- unified Unicode tables from libcollections, libcore, and libregex into
  libunicode
- updated unicode.py in src/etc to generate aforementioned tables
- generated new tables based on latest Unicode data
- added UnicodeChar and UnicodeStrSlice traits to prelude
- libunicode is now the collection point for the std::char module,
  combining the libunicode functionality with the Char functionality
  from libcore
  - thus, moved doc comment for char from core::char to unicode::char
- libcollections remains the collection point for std::str

The Unicode-aware functions that previously lived in the Char and
StrSlice traits are no longer available to programs that only use
libcore. To regain use of these methods, include the libunicode crate
and use the UnicodeChar and/or UnicodeStrSlice traits:

    extern crate unicode;
    use unicode::UnicodeChar;
    use unicode::UnicodeStrSlice;
    use unicode::Words; // if you want to use the words() method

NOTE: this does *not* impact programs that use libstd, since UnicodeChar
and UnicodeStrSlice have been added to the prelude.

closes #15224
[breaking-change]
2014-07-07 14:52:24 -04:00
Florian Zeitz
df802a2754 std: Rename str::Normalizations to str::Decompositions
The Normalizations iterator has been renamed to Decompositions.
It does not currently include all forms of Unicode normalization,
but only encompasses decompositions.
If implemented recomposition would likely be a separate iterator
which works on the result of this one.

[breaking-change]
2014-05-13 17:24:07 -07:00
Florian Zeitz
8c54d5bf40 core: Move Hangul decomposition into unicode.rs 2014-05-13 17:24:07 -07:00
Florian Zeitz
74ad023674 std, core: Generate unicode.rs using unicode.py 2014-05-13 17:24:07 -07:00
Manish Goregaokar
713e87526e Use new attribute syntax in python files in src/etc too (#13478) 2014-04-14 21:00:31 +05:30
Daniel Micay
ce620320a2 rename std::vec -> std::slice
Closes #12702
2014-03-20 01:30:27 -04:00
Piotr Zolnierek
dba5625cb8 Remove code duplication
Remove whitespace

Update documentation for to_uppercase, to_lowercase
2014-03-13 12:23:24 +01:00
Piotr Zolnierek
04170b0a41 Implement lower, upper case conversion for char 2014-03-13 09:32:05 +01:00
Piotr Zolnierek
4a00211916 std::unicode: remove unused category tables 2014-03-13 09:32:05 +01:00
Adrien Tétar
0ebe112b3b etc: add missing license boilerplates 2014-02-05 19:53:53 +01:00
Florian Zeitz
dfe38dbca4 Fix handling of upper/lowercase, and whitespace 2013-11-27 23:36:20 +01:00
Florian Zeitz
e9ab9bf01a Update unicode.py to reflect language changes 2013-11-27 23:21:22 +01:00
Daniel Micay
6919cf5fe1 rename std::iterator to std::iter
The trait will keep the `Iterator` naming, but a more concise module
name makes using the free functions less verbose. The module will define
iterables in addition to iterators, as it deals with iteration in
general.
2013-09-09 03:21:46 -04:00
Daniel Micay
62a3434529 stop treating char as an integer type
Closes #7609
2013-09-04 08:07:56 -04:00
Florian Zeitz
2675f3e9e7 Add canonical combining class to std::unicode 2013-08-21 11:50:07 +02:00
Florian Zeitz
83f4bee44f Add Unicode decomposition mappings to std::unicode 2013-08-21 11:50:07 +02:00
Huon Wilson
c437a16c5d rustc: add a lint to enforce uppercase statics. 2013-07-01 17:52:57 +10:00
Huon Wilson
faa8f8ff8b Convert vec::{bsearch, bsearch_elem} to methods. 2013-06-30 21:15:25 +10:00
Huon Wilson
562dea1820 etc: update etc/unicode.py for the changes made to std::unicode. 2013-06-30 21:15:25 +10:00
kud1ing
6487cb221b Explain that the source code was generated by this script 2013-05-02 13:37:57 +03:00
Graydon Hoare
5a3d26f271 core: replace unicode match exprs with bsearch in const arrays, minor perf win. 2013-04-18 14:39:40 -07:00
Brian Anderson
6b6acde972 Add a license check to tidy. #4018 2013-01-17 23:28:42 -08:00
Graydon Hoare
ca7d389e1d Of course there were overlong lines. 2011-12-29 17:30:43 -08:00
Graydon Hoare
1cd132eef0 Teach unicode script to emit canonical and compat decomp mappings. Annoyingly large encoding. 2011-12-29 17:24:04 -08:00
Graydon Hoare
ac13f0da9e Add support to libcore for encoded-in-rust unicode character properties, at least. Add script to compute them from unicode.org. 2011-12-23 18:48:08 -08:00