[Xapian-tickets] [Xapian] #245: All-stopword queries with two or more terms should ignore stopword list

Xapian nobody at xapian.org
Thu Jul 15 14:09:56 BST 2010


#245: All-stopword queries with two or more terms should ignore stopword list
-------------------------+--------------------------------------------------
 Reporter:  richard      |        Owner:  olly     
     Type:  defect       |       Status:  closed   
 Priority:  normal       |    Milestone:  1.2.3    
Component:  QueryParser  |      Version:  SVN trunk
 Severity:  normal       |   Resolution:  fixed    
 Keywords:               |    Blockedby:           
 Platform:  All          |     Blocking:           
-------------------------+--------------------------------------------------
Changes (by olly):

  * status:  assigned => closed
  * resolution:  => fixed


Old description:

> Currently, if a single word query is parsed, and that word is a stopword,
> the
> stopwording is ignored.  However, if a multiple word query is parsed, and
> all
> words are stopwords, the stopwording is applied (resulting in an empty
> query).
>
> If all the words in the query are stopwords, I think it may make sense to
> ignore
> the stopwording.  However, even if we decide to apply the stopwording in
> this
> case, we should be consistent in our behaviour.
>
> Some examples, in python:
>
> >>> import xapian
> >>> s=xapian.SimpleStopper()
> >>> s.add('foo')
> >>> s.add('bar')
> >>> qp=xapian.QueryParser()
> >>> qp.set_stopper(s)
> >>> str(qp.parse_query('foo'))
> 'Xapian::Query(foo:(pos=1))'
> >>> str(qp.parse_query('foo foo'))
> 'Xapian::Query()'
> >>> str(qp.parse_query('foo bar'))
> 'Xapian::Query()'
>
> Either the first parse_query() call should return Xapian::Query(), or the
> later
> ones should return non-empty queries.

New description:

 Currently, if a single word query is parsed, and that word is a stopword,
 the
 stopwording is ignored.  However, if a multiple word query is parsed, and
 all
 words are stopwords, the stopwording is applied (resulting in an empty
 query).

 If all the words in the query are stopwords, I think it may make sense to
 ignore
 the stopwording.  However, even if we decide to apply the stopwording in
 this
 case, we should be consistent in our behaviour.

 Some examples, in python:

 {{{
 >>> import xapian
 >>> s=xapian.SimpleStopper()
 >>> s.add('foo')
 >>> s.add('bar')
 >>> qp=xapian.QueryParser()
 >>> qp.set_stopper(s)
 >>> str(qp.parse_query('foo'))
 'Xapian::Query(foo:(pos=1))'
 >>> str(qp.parse_query('foo foo'))
 'Xapian::Query()'
 >>> str(qp.parse_query('foo bar'))
 'Xapian::Query()'
 }}}

 Either the first parse_query() call should return Xapian::Query(), or the
 later
 ones should return non-empty queries.

--

Comment:

 I committed the patch (with comment improvements) in trunk r14845.  Maybe
 there's a neater way, but it does at least work.

 I think this probably isn't worth backporting to 1.0 at this point - I've
 not seen any feedback from end users on this (unless that was what
 triggered your report?)

-- 
Ticket URL: <http://trac.xapian.org/ticket/245#comment:10>
Xapian <http://xapian.org/>
Xapian



More information about the Xapian-tickets mailing list