[Xapian-discuss] Clarification of values, data, fields, and prefixed terms

James Aylett james-xapian at tartarus.org
Tue Sep 4 17:19:22 BST 2007


On Tue, Sep 04, 2007 at 11:31:24AM -0400, Deron Meranda wrote:

> Yes, that makes a lot more sense now.  If in some future
> release values were made more efficient then the need for
> data fields would mostly go away.  But until then the choice
> between them, even if functionally equivalent, can affect
> performance.

That's not really true - values may have restrictions on length which
won't apply to document data, because of the different intended
use. Doc data provides for structure which values don't give you, and
may be more efficient for some uses there irrespective of how Xapian
works.

> If a piece of meta-data needs to be available for probabilistic
> searching then employ some sort of term prefixing convention.
> You can follow the Omega standardized term prefix rules, or
> if you want you can make up your own prefixing rules.  If used,
> term prefixing must be applied consistently to all terms, and not
> just those representing the meta-data.

Yep.

> If a piece of data will be most useful for partitioning (boolean
> searching), for sorting match results, or for range-searching
> then store it in a value.  Values should be "normalized" into
> some format that preserves simple byte-by-byte lexigraphical
> sorting (such as zero-padding numbers).

Yep.

> If a piece of data will not be used directly for doing searching
> or producing a match set (although it could be used for the
> *presentation* of the result set), then you should either:
>    a) store it in some external database, or
>    b) store it in the Xapian "data" area.
> The former may be more appropriate where Xapian is
> just part of a larger system.  For the later you can optionally
> format the data area into a set of field/value pairs as Omega
> does, or you can put any kind of binary blob into it you wish.

Yep.

> Also if a piece of meta-data needs to be easily extracted
> from a document in the match set (such as a unique
> document ID# or URN) then it should either be put into
> a value or a data field--don't just store it as a prefixed term.

Indeed. Unless you're going to use it for one of the other value uses,
I'd stick it in document data for preference.

> Does that more or less summarize the intended use of
> the different mechanisms?

Looks good to me :)

J

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org



More information about the Xapian-discuss mailing list