Package org.apache.lucene.analysis.fa
Class PersianStemmer
java.lang.Object
org.apache.lucene.analysis.fa.PersianStemmer
Stemmer for Persian.
Stemming is done in-place for efficiency, operating on a termbuffer.
Stemming is defined as:
- Removal of attached definite article, conjunction, and prepositions.
- Stemming of common suffixes.
-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate boolean
endsWithCheckLength
(char[] s, int len, char[] suffix) Returns true if the suffix matches and can be stemmedint
stem
(char[] s, int len) Stem an input buffer of Persian text.private int
stemSuffix
(char[] s, int len) Stem suffix(es) off a Persian word.
-
Field Details
-
ALEF
private static final char ALEF- See Also:
-
HEH
private static final char HEH- See Also:
-
TEH
private static final char TEH- See Also:
-
REH
private static final char REH- See Also:
-
NOON
private static final char NOON- See Also:
-
YEH
private static final char YEH- See Also:
-
ZWNJ
private static final char ZWNJ- See Also:
-
suffixes
private static final char[][] suffixes
-
-
Constructor Details
-
PersianStemmer
public PersianStemmer()
-
-
Method Details
-
stem
public int stem(char[] s, int len) Stem an input buffer of Persian text.- Parameters:
s
- input bufferlen
- length of input buffer- Returns:
- length of input buffer after normalization
-
stemSuffix
private int stemSuffix(char[] s, int len) Stem suffix(es) off a Persian word.- Parameters:
s
- input bufferlen
- length of input buffer- Returns:
- new length of input buffer after stemming
-
endsWithCheckLength
private boolean endsWithCheckLength(char[] s, int len, char[] suffix) Returns true if the suffix matches and can be stemmed- Parameters:
s
- input bufferlen
- length of input buffersuffix
- suffix to check- Returns:
- true if the suffix matches and can be stemmed
-