Class CommonTermsQuery


  • public class CommonTermsQuery
    extends Query
    A query that executes high-frequency terms in a optional sub-query to prevent slow queries due to "common" terms like stopwords. This query builds 2 queries off the added terms: low-frequency terms are added to a required boolean clause and high-frequency terms are added to an optional boolean clause. The optional clause is only executed if the required "low-frequency" clause matches. In most cases, high-frequency terms are unlikely to significantly contribute to the document score unless at least one of the low-frequency terms are matched. This query can improve query execution times significantly if applicable.

    CommonTermsQuery has several advantages over stopword filtering at index or query time since a term can be "classified" based on the actual document frequency in the index and can prevent slow queries even across domains without specialized stopword files.

    Note: if the query only contains high-frequency terms the query is rewritten into a plain conjunction query ie. all high-frequency terms need to match in order to match a document.

    • Field Detail

      • terms

        protected final java.util.List<Term> terms
      • maxTermFrequency

        protected final float maxTermFrequency
      • lowFreqBoost

        protected float lowFreqBoost
      • highFreqBoost

        protected float highFreqBoost
      • lowFreqMinNrShouldMatch

        protected float lowFreqMinNrShouldMatch
      • highFreqMinNrShouldMatch

        protected float highFreqMinNrShouldMatch
    • Constructor Detail

    • Method Detail

      • add

        public void add​(Term term)
        Adds a term to the CommonTermsQuery
        Parameters:
        term - the term to add
      • rewrite

        public Query rewrite​(IndexReader reader)
                      throws java.io.IOException
        Description copied from class: Query
        Expert: called to re-write queries into primitive queries. For example, a PrefixQuery will be rewritten into a BooleanQuery that consists of TermQuerys.
        Overrides:
        rewrite in class Query
        Throws:
        java.io.IOException
      • visit

        public void visit​(QueryVisitor visitor)
        Description copied from class: Query
        Recurse through the query tree, visiting any child queries
        Overrides:
        visit in class Query
        Parameters:
        visitor - a QueryVisitor to be called by each query in the tree
      • calcLowFreqMinimumNumberShouldMatch

        protected int calcLowFreqMinimumNumberShouldMatch​(int numOptional)
      • calcHighFreqMinimumNumberShouldMatch

        protected int calcHighFreqMinimumNumberShouldMatch​(int numOptional)
      • minNrShouldMatch

        private final int minNrShouldMatch​(float minNrShouldMatch,
                                           int numOptional)
      • buildQuery

        protected Query buildQuery​(int maxDoc,
                                   TermStates[] contextArray,
                                   Term[] queryTerms)
      • collectTermStates

        public void collectTermStates​(IndexReader reader,
                                      java.util.List<LeafReaderContext> leaves,
                                      TermStates[] contextArray,
                                      Term[] queryTerms)
                               throws java.io.IOException
        Throws:
        java.io.IOException
      • setLowFreqMinimumNumberShouldMatch

        public void setLowFreqMinimumNumberShouldMatch​(float min)
        Specifies a minimum number of the low frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part. This method accepts a float value in the range [0..1) as a fraction of the actual query terms in the low frequent clause or a number >=1 as an absolut number of clauses that need to match.

        By default no optional clauses are necessary for a match (unless there are no required clauses). If this method is used, then the specified number of clauses is required.

        Parameters:
        min - the number of optional clauses that must match
      • getLowFreqMinimumNumberShouldMatch

        public float getLowFreqMinimumNumberShouldMatch()
        Gets the minimum number of the optional low frequent BooleanClauses which must be satisfied.
      • setHighFreqMinimumNumberShouldMatch

        public void setHighFreqMinimumNumberShouldMatch​(float min)
        Specifies a minimum number of the high frequent optional BooleanClauses which must be satisfied in order to produce a match on the low frequency terms query part. This method accepts a float value in the range [0..1) as a fraction of the actual query terms in the low frequent clause or a number >=1 as an absolut number of clauses that need to match.

        By default no optional clauses are necessary for a match (unless there are no required clauses). If this method is used, then the specified number of clauses is required.

        Parameters:
        min - the number of optional clauses that must match
      • getHighFreqMinimumNumberShouldMatch

        public float getHighFreqMinimumNumberShouldMatch()
        Gets the minimum number of the optional high frequent BooleanClauses which must be satisfied.
      • getTerms

        public java.util.List<Term> getTerms()
        Gets the list of terms.
      • getMaxTermFrequency

        public float getMaxTermFrequency()
        Gets the maximum threshold of a terms document frequency to be considered a low frequency term.
      • getLowFreqBoost

        public float getLowFreqBoost()
        Gets the boost used for low frequency terms.
      • getHighFreqBoost

        public float getHighFreqBoost()
        Gets the boost used for high frequency terms.
      • toString

        public java.lang.String toString​(java.lang.String field)
        Description copied from class: Query
        Prints a query to a string, with field assumed to be the default field and omitted.
        Specified by:
        toString in class Query
      • hashCode

        public int hashCode()
        Description copied from class: Query
        Override and implement query hash code properly in a subclass. This is required so that QueryCache works properly.
        Specified by:
        hashCode in class Query
        See Also:
        Query.equals(Object)
      • equals

        public boolean equals​(java.lang.Object other)
        Description copied from class: Query
        Override and implement query instance equivalence properly in a subclass. This is required so that QueryCache works properly. Typically a query will be equal to another only if it's an instance of the same class and its document-filtering properties are identical that other instance. Utility methods are provided for certain repetitive code.
        Specified by:
        equals in class Query
        See Also:
        Query.sameClassAs(Object), Query.classHash()
      • newTermQuery

        protected Query newTermQuery​(Term term,
                                     TermStates termStates)
        Builds a new TermQuery instance.

        This is intended for subclasses that wish to customize the generated queries.

        Parameters:
        term - term
        termStates - the TermStates to be used to create the low level term query. Can be null.
        Returns:
        new TermQuery instance