choosing between probabilistic and boolean prefixes for terms

Olly Betts olly at survex.com
Wed Jul 25 05:45:25 BST 2018


On Thu, Jul 19, 2018 at 08:32:23PM +0000, Eric Wong wrote:
> public-inbox allows searching for git blob names (e.g. "badc0ffee")
> in patches.  Initially, I chose to use add_prefix for probabilistic
> terms, since I assumed it could be a superset of what boolean
> searching offered.  Unfortunately, it doesn't seem to be the case
> because stemming is interfering.
> 
> So switching to boolean filtering seems to work; and it is
> fine for mechanical searches I plan on doing:
> 
>   https://public-inbox.org/meta/20180716040734.30104-1-e@80x24.org/
> 
> Now I wonder, is there a way to get the best-of-both-worlds so
> a human can still use wildcards?

I struggle to think of a situation in which one would you want to
wildcard search for a git sha...

> public-inbox also allows searches on pathnames, and maybe that
> should use boolean filtering, too...

...but for a pathname that's more believable.

Currently you can't specify a different stemmer (or stemming mode)
per prefix.  Perhaps that should be supported - there are common
cases such as "author" fields where the stemming can be harmful,
but currently you'd have to have a separate text entry field for the
author search to support that directly.

I think you could use add_prefix() with a FieldProcessor object
since that get passed the term without stemming, but FieldProcessor
isn't wrapped by Search::Xapian (the SWIG-based Perl bindings do wrap
it, but the API isn't 100% the same as Search::Xapian's so you'd need
to test and probably adjust some of your code to port to that - it is
the future for using Xapian from Perl, but I've been hoping to sort out
the incompatibilities before pushing it more).

There isn't currently a flag to enable wildcards for boolean terms
but that could be supported I think.  It mostly isn't by default
because it seems less useful, and because it's assumed you could
have any character in a boolean term and "*" being special works
against that.  Some of the options to limit expansion don't really
make sense for a boolean filter, but I guess that's a case of "well
don't do that then".

Cheers,
    Olly



More information about the Xapian-discuss mailing list