[Xapian-discuss] word context, numeric values, and characters
Peter Karman
peter at peknet.com
Wed Dec 7 03:22:13 GMT 2005
Thanks, Olly. My follow up questions below:
Olly Betts scribbled on 12/6/05 9:11 PM:
> On Tue, Dec 06, 2005 at 09:01:10PM -0600, Peter Karman wrote:
>
>>For example, the html "<title>foo</title>" could be indexed as "Tfoo". A
>>query for "title:foo" would be parsed with add_prefix("title","T") and that
>>would generate a match for "Tfoo".
>>
>>Am I understanding that process correctly?
>
>
> Yes, that's spot on.
>
Has any thought been given to storing the context separately from the word
string itself? E.g.,
add_term( "foo", wdfinc, "T" )
Would that slow down queries, by having to do a separate compare against a
different value stored elsewhere? I'm assuming the reason the current convention
looks the way it does for speed in searching.
I ask because it seems like there exists the possibility of missing matches (or
false positives) if you wanted to include the ':' as a valid word character, as
in indexing source code, for example. If I wanted to find "foo::bar" exactly,
and not the phrase "foo bar", and I happened to have a prefix called "foo", then
might things get sticky?
--
Peter Karman . http://peknet.com/ . peter at peknet.com
More information about the Xapian-discuss
mailing list