String.normalize
normalize
, go back to String module for more information.
Converts all characters in string
to Unicode normalization
form identified by form
.
Invalid Unicode codepoints are skipped and the remaining of
the string is converted. If you want the algorithm to stop
and return on invalid codepoint, use :unicode.characters_to_nfd_binary/1
,
:unicode.characters_to_nfc_binary/1
, :unicode.characters_to_nfkd_binary/1
,
and :unicode.characters_to_nfkc_binary/1
instead.
Normalization forms :nfkc
and :nfkd
should not be blindly applied
to arbitrary text. Because they erase many formatting distinctions,
they will prevent round-trip conversion to and from many legacy
character sets.
Forms
The supported forms are:
:nfd
- Normalization Form Canonical Decomposition. Characters are decomposed by canonical equivalence, and multiple combining characters are arranged in a specific order.:nfc
- Normalization Form Canonical Composition. Characters are decomposed and then recomposed by canonical equivalence.:nfkd
- Normalization Form Compatibility Decomposition. Characters are decomposed by compatibility equivalence, and multiple combining characters are arranged in a specific order.:nfkc
- Normalization Form Compatibility Composition. Characters are decomposed and then recomposed by compatibility equivalence.
Examples
iex> String.normalize("yêṩ", :nfd)
"yêṩ"
iex> String.normalize("leña", :nfc)
"leña"
iex> String.normalize("fi", :nfkd)
"fi"
iex> String.normalize("fi", :nfkc)
"fi"