[Xapian-discuss] Clarification of values, data, fields,
and prefixed terms
Olly Betts
olly at survex.com
Mon Sep 3 00:59:55 BST 2007
On Sun, Sep 02, 2007 at 01:46:11PM +0100, James Aylett wrote:
> Values are used for filtering in the match process. So collapsing can
> be done on a value; you can use them in a MatchDecider and so
> on. Range filtering is another example, as you point out.
Sorting results is another big use.
> > Prefixed terms index documents just like ordinary terms/words and thus
> > are used in probabiistic searches, and can carry positional
> > information if desired. Prefix terms are really just a convention
> > (not part of Xapian core) by prepending some letters to the front of
> > terms before they are put in the index.
>
> Right. As far as Xapian is concerned, you just have a bunch of
> terms. How you create those terms, and your convention for term
> construction, is an important part of your index plan. Prefixes are a
> useful convention for reflecting document data/metadata structure in
> the terms you generate.
Note that the QueryParser and TermGenerator classes both include support
for using prefixed terms. You can ignore this support if you wish, but
then you can't use those features which rely on it.
> > Is my understanding essentially correct? Also, why would one ever use
> > the data fields rather than values?
>
> I'm not certain that it is actually true right now, but in theory
> you'll get better performance in some cases by using values as they're
> intended (to be looked up and used during the match process), and data
> as it's intended (to store additional metadata that Xapian doesn't
> care about, for display/whatever in your application).
Not just in theory. Currently to read a value for a document, all the
other values for that document have to be read, so abusing values as
general purpose fields will mean that more data has to be read for each
value accessed during the match - that's clearly going to adversely
performance in most cases.
I'd like to change how values are stored, but it'll still be a bad idea
to misuse them - just for different reasons.
Cheers,
Olly
More information about the Xapian-discuss
mailing list