[Xapian-discuss] Spaces in bool prefixes
Rick Olson
rick at napalmriot.com
Fri Feb 22 07:01:22 GMT 2008
James Aylett wrote:
> On Wed, Feb 20, 2008 at 07:15:50PM -0800, Rick Olson wrote:
>
>
>> For instance, I have:
>> add_boolean_prefix('X_Country:', 'country');
>> which works until you come across a country with more than one word,
>> such as United States.
>>
>
> In many situations, your boolean terms are going to come from a fixed
> or bounded vocabulary. Certainly in the case of modern countries you
> could replace at index time (and hence also at query time) with the
> appropriate ISO two- or three-letter code, avoiding the need for
> spaces.
>
> It's worth noting that (a) spaces generally are going to delimit
> terms, and (b) you don't want to use phrase searching unless you need
> to, because it needs to consider more in order to work. Since you
> almost always have to construct prefixed terms in boolean context,
> doing a little bit more work to construct them in a way that avoids
> spaces isn't generally a problem.
>
> If it is in your case, you may need to do some more work somewhere
> else. I'm assuming you don't actually expect your end users to type
> "country:united states" into a search box somewhere (although maybe
> I'm wrong), so it should be possible to come up with a solution
> without too much effort.
>
> J
Thanks for the suggestions. The idea of removing the spaces from my
terms has occurred to me; I was just hoping there was another way I
could accomplish filtering while also allowing spaces.
Your response raised another question for me:
- Does the same concern with phrased boolean prefixes apply the same
when searching for a quoted exact phrase, such as: "who ate my cheese" ?
I wrote some modifications in the Xapian queryparser (while I was
waiting for a response) which would not restrict spaces from boolean
prefixes, and I'm trying to figure out how badly allowing the extra
space[s] would affect performance. Using delve, the exact term is:
'X_country:united states' so that's what's causing me some confusion in
understanding the performance impact entirely.
If the amount of engine overhead required to allow for such a thing
isn't horribly awful, would there be some chance of allowing a
FLAG_BOOLEAN_PHRASE flag which would enable such behavior?
I'm not as intimate with Xapian internals as you guys are, so forgive my
ignorance =)
Thanks,
Rick
More information about the Xapian-discuss
mailing list