[Xapian-discuss] Omega and indexing documents

James Aylett james-xapian at tartarus.org
Thu Aug 11 09:46:45 BST 2005


On Wed, Aug 10, 2005 at 03:12:12PM -0300, Christiano Anderson wrote:

> This is the script I am using to index:
> 
> ------- index.py -------
> import xapian
> db = xapian.WritableDatabase("teste01", xapian.DB_CREATE_OR_OPEN)
> doc = xapian.Document()
> 
> record = """caption=Test page
> sample=This is a test
> size=4554
> url=http://www.test.com
> """
> 
> doc.set_data(record)
> doc.add_term("Ttext/html")
> doc.add_term("Hhttp://www.test.com")
> 
> doc.add_posting(record, 1)
> db.add_document(doc)
> ------ EOF -------

You don't want to add the entire record as a single posting - you
should split it into individual terms first. See indextext.cc in the
omega distribution for an example algorithm that fits well with
Xapian::QueryParser (xapian.QueryParser in Python :-).

J

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org



More information about the Xapian-discuss mailing list