[Xapian-discuss] word context, numeric values, and characters

Peter Karman peter at peknet.com
Wed Dec 7 03:22:13 GMT 2005


Thanks, Olly. My follow up questions below:

Olly Betts scribbled on 12/6/05 9:11 PM:

> On Tue, Dec 06, 2005 at 09:01:10PM -0600, Peter Karman wrote:
> 
>>For example, the html "<title>foo</title>" could be indexed as "Tfoo". A 
>>query for "title:foo" would be parsed with add_prefix("title","T") and that 
>>would generate a match for "Tfoo".
>>
>>Am I understanding that process correctly?
> 
> 
> Yes, that's spot on.
> 

Has any thought been given to storing the context separately from the word 
string itself? E.g.,

   add_term( "foo", wdfinc, "T" )

Would that slow down queries, by having to do a separate compare against a 
different value stored elsewhere? I'm assuming the reason the current convention 
looks the way it does for speed in searching.

I ask because it seems like there exists the possibility of missing matches (or 
false positives) if you wanted to include the ':' as a valid word character, as 
in indexing source code, for example. If I wanted to find "foo::bar" exactly, 
and not the phrase "foo bar", and I happened to have a prefix called "foo", then 
might things get sticky?


-- 
Peter Karman  .  http://peknet.com/  .  peter at peknet.com



More information about the Xapian-discuss mailing list