[Xapian-tickets] [Xapian] #750: Teach QueryParser about stopping strategies
Xapian
nobody at xapian.org
Fri Dec 8 00:01:27 GMT 2023
#750: Teach QueryParser about stopping strategies
-------------------------+-------------------------------
Reporter: mgautier | Owner: Olly Betts
Type: defect | Status: assigned
Priority: highest | Milestone: 1.5.0
Component: QueryParser | Version: git master
Severity: normal | Resolution:
Keywords: | Blocked By:
Blocking: | Operating System: All
-------------------------+-------------------------------
Changes (by Olly Betts):
* owner: Gaurav Arora => Olly Betts
* priority: normal => highest
* status: new => assigned
* milestone: 1.4.x => 1.5.0
Comment:
It'd be good to address this, though the cases from comment:6 are never
going to be handled entirely satisfactory due to inherent limitations with
index-time stopping:
{{{
$ examples/quest --stemmer fr '"le voiture"'
Parsed Query: Query((le at 1 PHRASE 2 voitur at 2))
}}}
If `le` is a stopword and removed before indexing then the best we can do
here would be `Query(voitur at 2)` which will match `voiture` without `le` in
front.
{{{
$ examples/quest --stemmer fr '"le" voiture'
Parsed Query: Query((le at 1 OR voitur at 2))
}}}
Same here.
{{{
$ examples/quest --stemmer fr 'le la'
Parsed Query: Query((le at 1 OR la at 2))
}}}
And here we can't do a search at all as the query is entirely made of
stopwords. This particular case probably isn't a useful search, but it's
possible to come up with queries entirely composed of stopwords. In
English the Shakespeare quote "to be or not to be" is one example.
I'm going to mark this to do for 1.5.0 for now, but if it proves more
involved than I hope it'll likely get postponed as we really need to
actually get a new stable release series out (and this could be backported
to a stable release).
--
Ticket URL: <https://trac.xapian.org/ticket/750#comment:11>
Xapian <https://xapian.org/>
Xapian
More information about the Xapian-tickets
mailing list