[Xapian-discuss] Spaces in bool prefixes

Rick Olson rick at napalmriot.com
Fri Feb 22 07:01:22 GMT 2008


James Aylett wrote:
> On Wed, Feb 20, 2008 at 07:15:50PM -0800, Rick Olson wrote:
>
>   
>> For instance, I have:
>> 	add_boolean_prefix('X_Country:', 'country');
>> which works until you come across a country with more than one word, 
>> such as United States.
>>     
>
> In many situations, your boolean terms are going to come from a fixed
> or bounded vocabulary. Certainly in the case of modern countries you
> could replace at index time (and hence also at query time) with the
> appropriate ISO two- or three-letter code, avoiding the need for
> spaces.
>
> It's worth noting that (a) spaces generally are going to delimit
> terms, and (b) you don't want to use phrase searching unless you need
> to, because it needs to consider more in order to work. Since you
> almost always have to construct prefixed terms in boolean context,
> doing a little bit more work to construct them in a way that avoids
> spaces isn't generally a problem.
>
> If it is in your case, you may need to do some more work somewhere
> else. I'm assuming you don't actually expect your end users to type
> "country:united states" into a search box somewhere (although maybe
> I'm wrong), so it should be possible to come up with a solution
> without too much effort.
>
> J

Thanks for the suggestions.  The idea of removing the spaces from my 
terms has occurred to me; I was just hoping there was another way I 
could accomplish filtering while also allowing spaces.

Your response raised another question for me:

- Does the same concern with phrased boolean prefixes apply the same 
when searching for a quoted exact phrase, such as:  "who ate my cheese"   ?

I wrote some modifications in the Xapian queryparser (while I was 
waiting for a response) which would not restrict spaces from boolean 
prefixes, and I'm trying to figure out how badly allowing the extra 
space[s] would affect performance.  Using delve, the exact term is: 
'X_country:united states' so that's what's causing me some confusion in 
understanding the performance impact entirely. 

If the amount of engine overhead required to allow for such a thing 
isn't horribly awful, would there be some chance of allowing a 
FLAG_BOOLEAN_PHRASE flag which would enable such behavior?

I'm not as intimate with Xapian internals as you guys are, so forgive my 
ignorance =)

Thanks,

Rick



More information about the Xapian-discuss mailing list