[Xapian-discuss] Omega and indexing documents
James Aylett
james-xapian at tartarus.org
Thu Aug 11 09:46:45 BST 2005
On Wed, Aug 10, 2005 at 03:12:12PM -0300, Christiano Anderson wrote:
> This is the script I am using to index:
>
> ------- index.py -------
> import xapian
> db = xapian.WritableDatabase("teste01", xapian.DB_CREATE_OR_OPEN)
> doc = xapian.Document()
>
> record = """caption=Test page
> sample=This is a test
> size=4554
> url=http://www.test.com
> """
>
> doc.set_data(record)
> doc.add_term("Ttext/html")
> doc.add_term("Hhttp://www.test.com")
>
> doc.add_posting(record, 1)
> db.add_document(doc)
> ------ EOF -------
You don't want to add the entire record as a single posting - you
should split it into individual terms first. See indextext.cc in the
omega distribution for an example algorithm that fits well with
Xapian::QueryParser (xapian.QueryParser in Python :-).
J
--
/--------------------------------------------------------------------------\
James Aylett xapian.org
james at tartarus.org uncertaintydivision.org
More information about the Xapian-discuss
mailing list