[Xapian-discuss] Spaces in bool prefixes

Olly Betts olly at survex.com
Fri Feb 22 10:16:59 GMT 2008


On Thu, Feb 21, 2008 at 11:01:22PM -0800, Rick Olson wrote:
> Thanks for the suggestions.  The idea of removing the spaces from my 
> terms has occurred to me; I was just hoping there was another way I 
> could accomplish filtering while also allowing spaces.

Not currently, but see bug#128:

http://www.xapian.org/cgi-bin/bugzilla/show_bug.cgi?id=128

> Your response raised another question for me:
> 
> - Does the same concern with phrased boolean prefixes apply the same 
> when searching for a quoted exact phrase, such as:  "who ate my cheese"   ?

I think James was thinking that the "boolean phrase" would map to
multiple terms.  But since boolean terms aren't generally added with
positional information, that's not going to work anyway.

I think what makes sense is that country:"united states" would map to
a single boolean term (something like `XCOUNTRY:united states').

> I wrote some modifications in the Xapian queryparser (while I was 
> waiting for a response) which would not restrict spaces from boolean 
> prefixes, and I'm trying to figure out how badly allowing the extra 
> space[s] would affect performance.  Using delve, the exact term is: 
> 'X_country:united states' so that's what's causing me some confusion in 
> understanding the performance impact entirely. 

Terms are essentially opaque blobs of data, so a space is no different
to any other character (actually, this isn't quite true - there's
currently some special handling for embedded zero bytes, but other
characters are handled opaquely, and the special handling for zero bytes
should be eliminated in the next major backend revision).

> If the amount of engine overhead required to allow for such a thing 
> isn't horribly awful, would there be some chance of allowing a 
> FLAG_BOOLEAN_PHRASE flag which would enable such behavior?

I don't think this should be thought of as a "boolean phrase" - although
quotes can indicate a probabilistic phrase, here they are indicating the
bounds for the text to put in the term.

But anyway, see bug#128 for previous discussion of this issue.

Cheers,
    Olly



More information about the Xapian-discuss mailing list