queries for a set of values

Olly Betts olly at survex.com
Sun Apr 28 23:38:00 BST 2024


On Sat, Apr 27, 2024 at 12:33:36AM +0100, Olly Betts wrote:
> On Fri, Apr 26, 2024 at 10:37:37PM +0000, Eric Wong wrote:
> > Say I have a bunch of values which I want to filter a query against.
> > If I had boolean terms, it could just OP_OR against the whole set.
> > IOW, this is what notmuch does with terms:
> > 
> > 	std::set<std::string> terms;
> > 
> > 	// notmuch populates terms via terms.insert(*i)...
> > 
> > 	Query(OP_OR, terms.begin(), terms.end());
> 
> The slicker way to do this (unless you need the std::set for other
> reasons) would be:
> 
>     Xapian::Query filter = Xapian::Query::MatchAll;
>     while (more_terms()) {
>         filter |= Xapian::Query(get_next_term());
>     }
> 
> Assuming you're using Xapian >= 1.4.10 then |= on an OP_OR Query with
> refcount 1 (as here) is specially optimised and just appends a new
> subquery so you get a single OP_OR node and this is particularly
> efficient (if the refcount is higher it'll build a tree, but still get
> optimised the same way - it's just a bit less efficient because it needs
> to allocate for each node in the tree).
> 
> One difference is that filter here will match everything if there are
> no filter terms, so you can just always apply it:
> 
>     query = Xapian::Query(OP_FILTER, query, filter);
> 
> The notmuch way will match nothing for that case so you need to
> conditionalise applying the filter (assuming you still want to match
> something when there are no filter terms).

Something else worthy of mention here is that there's another approach
using a shim iterator class which is useful for cases such as a synonym
or phrase query that can't have subqueries appended one by one.

You make a little custom iterator class which returns a subquery on
each iteration and then construct a Query object passing begin and end
iterators of this class.  C++ templates then effectively turn that into a
loop without needing a container as temporary storage.  For an example,
see SynonymIterator here:

https://git.xapian.org/?p=xapian;a=blob;f=xapian-core/queryparser/queryparser.lemony;h=0ffeb50eaa39a2dffa257b5b6913112099931d70;hb=refs/heads/master#l348

I don't think you can achieve this via the bindings though, whereas the
operator |= trick above should work for bindings which wrap that
operator in a usable way.

Cheers,
    Olly



More information about the Xapian-discuss mailing list