searching on presence of a term prefix

Olly Betts olly at survex.com
Tue Nov 9 04:04:54 GMT 2021


On Tue, Nov 09, 2021 at 03:11:05AM +0000, Eric Wong wrote:
> Hey all, I'm wondering if there's a way to search for documents
> based on whether a prefix was used or not, regardless of the
> text indexed with that prefix.
> 
> I'm already indexing email attachment filenames with the "XFN"
> prefix.  However, I may want to construct a query that returns
> emails with any attachment filename in them at all.

There is a way, but it's probably not a good idea for a large system:

    Xapian::Query(Xapian::Query::OP_WILDCARD, "XFN")

The reason you probably don't want to do that is that it is essentially
the same as a big OR of all the terms with the prefix "XFN", so here
that's one for each unique attachment filename (it's a bit more
efficient than that big OR for a few reasons, but that gives you an idea
of what's involved).

> Would I have to add a new boolean term to search against to
> accomplish this?

That's the way to make it fast.

One trick here is that if most emails have attachments, you could make
it a flag for those that don't and filter with OP_AND_NOT to get emails
with attachments, or OP_FILTER to get those without.

> Using XS Search::Xapian on Debian buster and bullseye.

I don't think Search::Xapian wraps OP_WILDCARD (or more importantly the
Xapian::Query constructor for use with it - the OP_WILDCARD constant
would be fairly easy to define yourself).

Cheers,
    Olly



More information about the Xapian-discuss mailing list