[Xapian-discuss] Is this a correct method of indexing?

Olly Betts olly at survex.com
Mon Jan 19 07:00:09 GMT 2009


On Mon, Jan 19, 2009 at 01:26:40AM -0500, Tony Lambiris wrote:
> I'm kind of new to Xapian and search in general, but I am in the
> process of working with Xapian to index documents and I am becoming a
> little confused as to all the functions, as from a top-level appear to
> accomplish much of the same thing.

Did you read this?

http://trac.xapian.org/wiki/FAQ/TermGenerator

> What I am trying to do now, is basically index a document but I want
> to add more weight to the document title. After multiple tries with
> all the various functions (ie: add_term, add_posting, etc), this is
> what I ended up with:
> doc.add_term(doc_title, 100);

Um, that indexes the title as a single term, which I don't think is what
you want.  You'd have to write your own custom query parsing code for
such terms to be used when querying, and there's a limit on the length
of terms, so this would fail for long titles.

> The idea being that if the query matches the exact title, I want to
> really rank it high. After that I use index_text_without_positions to
> index the entire document as I won't be using any phrase or NEAR
> queries, and I also read this method takes up less space.

Yes, it saves having to store data about the positions of terms in
documents, which can be quite large.

> I don't know if it's over-kill to index the entire document or not, or
> if there are any preferred methods.

It's only overkill if you don't need to be able to search the whole
document.

> the database grows quite large and indexing slows down dramatically.

How large is "quite large"?  The FAQ discusses what sort of database
size you should expect.

You can usually speed up indexing using XAPIAN_FLUSH_THRESHOLD which
is currently set rather conservatively by default:

http://xapian.org/docs/apidoc/html/classXapian_1_1WritableDatabase.html#d0077acafa9485c97b73b8726c375732

Cheers,
    Olly



More information about the Xapian-discuss mailing list