[Xapian-discuss] Spaces in bool prefixes
Olly Betts
olly at survex.com
Fri Feb 22 10:16:59 GMT 2008
On Thu, Feb 21, 2008 at 11:01:22PM -0800, Rick Olson wrote:
> Thanks for the suggestions. The idea of removing the spaces from my
> terms has occurred to me; I was just hoping there was another way I
> could accomplish filtering while also allowing spaces.
Not currently, but see bug#128:
http://www.xapian.org/cgi-bin/bugzilla/show_bug.cgi?id=128
> Your response raised another question for me:
>
> - Does the same concern with phrased boolean prefixes apply the same
> when searching for a quoted exact phrase, such as: "who ate my cheese" ?
I think James was thinking that the "boolean phrase" would map to
multiple terms. But since boolean terms aren't generally added with
positional information, that's not going to work anyway.
I think what makes sense is that country:"united states" would map to
a single boolean term (something like `XCOUNTRY:united states').
> I wrote some modifications in the Xapian queryparser (while I was
> waiting for a response) which would not restrict spaces from boolean
> prefixes, and I'm trying to figure out how badly allowing the extra
> space[s] would affect performance. Using delve, the exact term is:
> 'X_country:united states' so that's what's causing me some confusion in
> understanding the performance impact entirely.
Terms are essentially opaque blobs of data, so a space is no different
to any other character (actually, this isn't quite true - there's
currently some special handling for embedded zero bytes, but other
characters are handled opaquely, and the special handling for zero bytes
should be eliminated in the next major backend revision).
> If the amount of engine overhead required to allow for such a thing
> isn't horribly awful, would there be some chance of allowing a
> FLAG_BOOLEAN_PHRASE flag which would enable such behavior?
I don't think this should be thought of as a "boolean phrase" - although
quotes can indicate a probabilistic phrase, here they are indicating the
bounds for the text to put in the term.
But anyway, see bug#128 for previous discussion of this issue.
Cheers,
Olly
More information about the Xapian-discuss
mailing list