[Xapian-discuss] time bias/sorting
Olly Betts
olly at survex.com
Fri Jul 14 15:18:05 BST 2006
On Wed, Jul 12, 2006 at 11:43:39AM +0100, Joss Shaw wrote:
> A term is like a posting but without positional information. You search on
> terms and postings.
>
> What therefore is a value - are these searched on in the traditional
> sense ('keyword foo bar'), or are they used just to narrow a search
> down - like a boolean operator might.
Hmm, the "Overview" document isn't at all clear on this (and even refers
you to a non-existent Enquire method). I've just rewritten it to say this
which is somewhat better:
Each document can have the following types of information associated with it:
* document data - this is an arbitrary block of data accessed using
Xapian::Document::get_data(). The contents of the document data can be
whatever you want and in whatever format. Often it contains a URL or other
external UID, a document title, and an excerpt from the document text. If
you wish to interoperate with Omega, it should contain name=value pairs,
one per line (recent versions of Omega also support one field value per
line, and can assign names to line numbers in the query template).
* document values - these are arbitrary blocks of data which are stored so
they can be accessed rapidly during the match process (to allow sorting
collapsing of duplicates, etc). Each block is stored in a numbered slot.
There's currently no length limit, but you should keep them short for
efficiency.
* terms and positional information - terms index the document (like index
entries in the back of a book); positional information records the word
offset into the document of each occurrence of a particular term. This is
used to implement phrase searching and the NEAR operator.
There's some overlap in what you can do with terms and with values. A
simple boolean operator (e.g. document language) is definitely better
done using a term and OP_FILTER.
Using a value allows you to do things you can't do with terms, such as
"sort by price", or "show only the best match for each website". You
can also perform filtering with a value which is more sophisticated
than can easily be achieved with terms, for example: find matches
with a price between $100 and $900. Omega uses boolean terms to perform
date range filtering, but this might actually be better done using a
value (the code in Omega was written before values were added to
Xapian).
Cheers,
Olly
More information about the Xapian-discuss
mailing list