queries for a set of values
Olly Betts
olly at survex.com
Sun Apr 28 23:38:00 BST 2024
On Sat, Apr 27, 2024 at 12:33:36AM +0100, Olly Betts wrote:
> On Fri, Apr 26, 2024 at 10:37:37PM +0000, Eric Wong wrote:
> > Say I have a bunch of values which I want to filter a query against.
> > If I had boolean terms, it could just OP_OR against the whole set.
> > IOW, this is what notmuch does with terms:
> >
> > std::set<std::string> terms;
> >
> > // notmuch populates terms via terms.insert(*i)...
> >
> > Query(OP_OR, terms.begin(), terms.end());
>
> The slicker way to do this (unless you need the std::set for other
> reasons) would be:
>
> Xapian::Query filter = Xapian::Query::MatchAll;
> while (more_terms()) {
> filter |= Xapian::Query(get_next_term());
> }
>
> Assuming you're using Xapian >= 1.4.10 then |= on an OP_OR Query with
> refcount 1 (as here) is specially optimised and just appends a new
> subquery so you get a single OP_OR node and this is particularly
> efficient (if the refcount is higher it'll build a tree, but still get
> optimised the same way - it's just a bit less efficient because it needs
> to allocate for each node in the tree).
>
> One difference is that filter here will match everything if there are
> no filter terms, so you can just always apply it:
>
> query = Xapian::Query(OP_FILTER, query, filter);
>
> The notmuch way will match nothing for that case so you need to
> conditionalise applying the filter (assuming you still want to match
> something when there are no filter terms).
Something else worthy of mention here is that there's another approach
using a shim iterator class which is useful for cases such as a synonym
or phrase query that can't have subqueries appended one by one.
You make a little custom iterator class which returns a subquery on
each iteration and then construct a Query object passing begin and end
iterators of this class. C++ templates then effectively turn that into a
loop without needing a container as temporary storage. For an example,
see SynonymIterator here:
https://git.xapian.org/?p=xapian;a=blob;f=xapian-core/queryparser/queryparser.lemony;h=0ffeb50eaa39a2dffa257b5b6913112099931d70;hb=refs/heads/master#l348
I don't think you can achieve this via the bindings though, whereas the
operator |= trick above should work for bindings which wrap that
operator in a usable way.
Cheers,
Olly
More information about the Xapian-discuss
mailing list