public final class CaseCanonicalize
extends java.lang.Object
From section 15.10.2.9,
The abstract operation Canonicalize takes a character parameter ch and performs the following steps:
- If IgnoreCase is false, return ch.
- Let u be ch converted to upper case as if by calling the standard built-in method
String.prototype.toUpperCase
on the one-character String ch.- If u does not consist of a single character, return ch.
- Let cu be u's character.
- If ch's code unit value is greater than or equal to decimal 128 and cu's code unit value is less than decimal 128, then return ch.
- Return cu.
Modifier and Type | Class and Description |
---|---|
private static class |
CaseCanonicalize.DeltaSet
A group of code units such that for all cu in codeUnits, cu is equivalent,
case-insensitively, to cu + delta.
|
Modifier and Type | Field and Description |
---|---|
private static com.google.common.collect.ImmutableList<CaseCanonicalize.DeltaSet> |
CANON_DELTA_SETS |
static CharRanges |
CASE_SENSITIVE
Set of code units that are case-insensitively equivalent to some other
code unit according to the EcmaScript
Canonicalize operation
described in section 15.10.2.8.
|
private static com.google.common.collect.ImmutableList<CaseCanonicalize.DeltaSet> |
DELTA_SETS
Sets of code units broken down by delta that are case-insensitively
equivalent to another code unit that differs from the first by that delta.
|
private static CharRanges |
UCASE_ASCII_LETTERS |
Modifier | Constructor and Description |
---|---|
private |
CaseCanonicalize() |
Modifier and Type | Method and Description |
---|---|
static char |
caseCanonicalize(char ch)
Returns the case canonical version of the given code-unit.
|
static java.lang.String |
caseCanonicalize(java.lang.String s)
Returns the case canonical version of the given string.
|
static CharRanges |
expandToAllMatched(CharRanges ranges)
Given a character range that may include case sensitive code-units,
such as
[0-9B-M] , returns the character range that includes all
the code-units in the input and those that are case-insensitively
equivalent to a code-unit in the input. |
static CharRanges |
reduceToMinimum(CharRanges ranges)
Given a character range that may include case sensitive code-units,
such as
[0-9B-M] , returns the character range that includes
the minimal set of code units such that for every code unit in the
input there is a case-sensitively equivalent canonical code unit in the
output. |
public static final CharRanges CASE_SENSITIVE
String.prototype.toUpperCase
which is itself based on Unicode 3.0.0
as specified at
UnicodeData-3.0.0
and
SpecialCasings-2.txt
.
This table was generated by running the below on Chrome:
for (var cc = 0; cc < 0x10000; ++cc) { var ch = String.fromCharCode(cc); var u = ch.toUpperCase(); if (ch != u && u.length === 1) { var cu = u.charCodeAt(0); if (cc <= 128 || u.charCodeAt(0) > 128) { print('0x' + cc.toString(16) + ', 0x' + cu.toString(16) + ','); } } }
private static final CharRanges UCASE_ASCII_LETTERS
private static final com.google.common.collect.ImmutableList<CaseCanonicalize.DeltaSet> DELTA_SETS
private static final com.google.common.collect.ImmutableList<CaseCanonicalize.DeltaSet> CANON_DELTA_SETS
public static java.lang.String caseCanonicalize(java.lang.String s)
public static char caseCanonicalize(char ch)
public static CharRanges expandToAllMatched(CharRanges ranges)
[0-9B-M]
, returns the character range that includes all
the code-units in the input and those that are case-insensitively
equivalent to a code-unit in the input.public static CharRanges reduceToMinimum(CharRanges ranges)
[0-9B-M]
, returns the character range that includes
the minimal set of code units such that for every code unit in the
input there is a case-sensitively equivalent canonical code unit in the
output.