[Xapian-discuss] New to Xapian (coming from Lucene)
James Aylett
james-xapian at tartarus.org
Thu Apr 12 12:15:59 BST 2007
On Mon, Apr 09, 2007 at 01:18:39PM -0400, Jeff Anderson wrote:
> As i continue to read through vairous Xapian docs and code examples, i have
> noticed that Xapian appears to want positional parameters to store values
> into a document. For example, we are indexing book information, such as
> TITLE, ISBN, AUTHORS, etc. With Lucene, one can add these attributes to a
> document by specifying arbitrary key names:
>
> doc->add_value('TITLE', 'How To Add Searching To Your Site')
>
> However, XAPIAN appears to only accept numbers, and not keys that could be
> used to retrieve a value.
Hi, Jeff. I think you're running into (understandable) confusion here
because of differences in naming conventions between Xapian and
Lucene.
Xapian has two *main* things you worry about putting into your
document:
* TERMS (which are used for searching)
* DATA (which is most often used for displaying the search results)
Values are quite a specialised thing, so don't worry about them for
now.
Terms come in two forms: postings (which have positional information)
and "plain" terms (which don't). So you can do:
----------------------------------------------------------------------
doc->add_posting("term", position);
doc->add_term("anotherterm");
----------------------------------------------------------------------
for instance. Neither term not anotherterm will be stemmed or split
apart at this point - this is real low level stuff.
The document data is simply a chunk of text or binary data which is
stored alongside the document. It is most commonly used to implement
what we call FIELDS as name-value pairs, by doing something like:
----------------------------------------------------------------------
title=How To Add Searching To Your Site
authors=Jeff Anderson
----------------------------------------------------------------------
Xapian doesn't provide any support itself for doing this, but in Perl
it's pretty easy to do (you could, in fact, serialise a Perl hash to
make life really easy).
Ask if any of that doesn't make sense.
Now. Many of your fields are going to be things you also want to
search on, such as title and author. What you probably want to do is
to create terms out of those field values, as well as the fields
themselves. This is where we get into term prefixes. Have a read
through omega/docs/termprefixes.txt and come back with any questions
:)
J
--
/--------------------------------------------------------------------------\
James Aylett xapian.org
james at tartarus.org uncertaintydivision.org
More information about the Xapian-discuss
mailing list