Class FSTCompletion


  • public class FSTCompletion
    extends java.lang.Object
    Finite state automata based implementation of "autocomplete" functionality.
    See Also:
    FSTCompletionBuilder
    • Field Detail

      • DEFAULT_BUCKETS

        public static final int DEFAULT_BUCKETS
        Default number of buckets.
        See Also:
        Constant Field Values
      • EMPTY_RESULT

        private static final java.util.ArrayList<FSTCompletion.Completion> EMPTY_RESULT
        An empty result. Keep this an ArrayList to keep all the returned lists of single type (monomorphic calls).
      • automaton

        private final FST<java.lang.Object> automaton
        Finite state automaton encoding all the lookup terms. See class notes for details.
      • rootArcs

        private final FST.Arc<java.lang.Object>[] rootArcs
        An array of arcs leaving the root automaton state and encoding weights of all completions in their sub-trees.
    • Constructor Detail

      • FSTCompletion

        public FSTCompletion​(FST<java.lang.Object> automaton,
                             boolean higherWeightsFirst,
                             boolean exactFirst)
        Constructs an FSTCompletion, specifying higherWeightsFirst and exactFirst.
        Parameters:
        automaton - Automaton with completions. See FSTCompletionBuilder.
        higherWeightsFirst - Return most popular suggestions first. This is the default behavior for this implementation. Setting it to false has no effect (use constant term weights to sort alphabetically only).
        exactFirst - Find and push an exact match to the first position of the result list if found.
    • Method Detail

      • cacheRootArcs

        private static FST.Arc<java.lang.Object>[] cacheRootArcs​(FST<java.lang.Object> automaton)
        Cache the root node's output arcs starting with completions with the highest weights.
      • getExactMatchStartingFromRootArc

        private int getExactMatchStartingFromRootArc​(int rootArcIndex,
                                                     BytesRef utf8)
        Returns the first exact match by traversing root arcs, starting from the arc rootArcIndex.
        Parameters:
        rootArcIndex - The first root arc index in rootArcs to consider when matching.
        utf8 - The sequence of utf8 bytes to follow.
        Returns:
        Returns the bucket number of the match or -1 if no match was found.
      • lookup

        public java.util.List<FSTCompletion.Completion> lookup​(java.lang.CharSequence key,
                                                               int num)
        Lookup suggestions to key.
        Parameters:
        key - The prefix to which suggestions should be sought.
        num - At most this number of suggestions will be returned.
        Returns:
        Returns the suggestions, sorted by their approximated weight first (decreasing) and then alphabetically (UTF-8 codepoint order).
      • lookupSortedAlphabetically

        private java.util.List<FSTCompletion.Completion> lookupSortedAlphabetically​(BytesRef key,
                                                                                    int num)
                                                                             throws java.io.IOException
        Lookup suggestions sorted alphabetically if weights are not constant. This is a workaround: in general, use constant weights for alphabetically sorted result.
        Throws:
        java.io.IOException
      • lookupSortedByWeight

        private java.util.ArrayList<FSTCompletion.Completion> lookupSortedByWeight​(BytesRef key,
                                                                                   int num,
                                                                                   boolean collectAll)
                                                                            throws java.io.IOException
        Lookup suggestions sorted by weight (descending order).
        Parameters:
        collectAll - If true, the routine terminates immediately when num suggestions have been collected. If false, it will collect suggestions from all weight arcs (needed for lookupSortedAlphabetically(org.apache.lucene.util.BytesRef, int).
        Throws:
        java.io.IOException
      • descendWithPrefix

        private boolean descendWithPrefix​(FST.Arc<java.lang.Object> arc,
                                          BytesRef utf8)
                                   throws java.io.IOException
        Descend along the path starting at arc and going through bytes in the argument.
        Parameters:
        arc - The starting arc. This argument is modified in-place.
        utf8 - The term to descend along.
        Returns:
        If true, arc will be set to the arc matching last byte of term. false is returned if no such prefix exists.
        Throws:
        java.io.IOException
      • collect

        private boolean collect​(java.util.List<FSTCompletion.Completion> res,
                                int num,
                                int bucket,
                                BytesRef output,
                                FST.Arc<java.lang.Object> arc)
                         throws java.io.IOException
        Recursive collect lookup results from the automaton subgraph starting at arc.
        Parameters:
        num - Maximum number of results needed (early termination).
        Throws:
        java.io.IOException
      • getBucketCount

        public int getBucketCount()
        Returns the bucket count (discretization thresholds).
      • getBucket

        public int getBucket​(java.lang.CharSequence key)
        Returns the bucket assigned to a given key (if found) or -1 if no exact match exists.
      • getFST

        public FST<java.lang.Object> getFST()
        Returns the internal automaton.