Class CJKWidthCharFilter

  • All Implemented Interfaces:
    java.io.Closeable, java.lang.AutoCloseable, java.lang.Readable

    public class CJKWidthCharFilter
    extends BaseCharFilter
    A CharFilter that normalizes CJK width differences:
    • Folds fullwidth ASCII variants into the equivalent basic latin
    • Folds halfwidth Katakana variants into the equivalent kana

    NOTE: this char filter is the exact counterpart of CJKWidthFilter.

    • Field Detail

      • KANA_NORM

        private static final char[] KANA_NORM
      • KANA_COMBINE_VOICED

        private static final byte[] KANA_COMBINE_VOICED
      • KANA_COMBINE_SEMI_VOICED

        private static final byte[] KANA_COMBINE_SEMI_VOICED
      • HW_KATAKANA_VOICED_MARK

        private static final int HW_KATAKANA_VOICED_MARK
        See Also:
        Constant Field Values
      • HW_KATAKANA_SEMI_VOICED_MARK

        private static final int HW_KATAKANA_SEMI_VOICED_MARK
        See Also:
        Constant Field Values
      • prevChar

        private int prevChar
      • inputOff

        private int inputOff
    • Constructor Detail

      • CJKWidthCharFilter

        public CJKWidthCharFilter​(java.io.Reader in)
        Default constructor that takes a Reader.
    • Method Detail

      • read

        public int read()
                 throws java.io.IOException
        Overrides:
        read in class java.io.Reader
        Throws:
        java.io.IOException
      • combineVoiceMark

        private int combineVoiceMark​(int ch,
                                     int voiceMark)
        returns combined char if we successfully combined the voice mark, otherwise original char
      • read

        public int read​(char[] cbuf,
                        int off,
                        int len)
                 throws java.io.IOException
        Specified by:
        read in class java.io.Reader
        Throws:
        java.io.IOException