[Xapian-discuss] Getting documents "like" arbitrary text?

Ryan Shaw ryanshaw at ischool.berkeley.edu
Sun Jun 29 15:56:16 BST 2008


Hello,

I am try to build a service that takes as input arbitrary text, treats
it as a document and returns a list of similar documents from my
index. I'm using the Xapian Python bindings.

My thought was to do something like this:

database = xapian.Database('/opt/index')
querydoc = xapian.Document()
querydoc.set_data(u'test')
indexer = xapian.TermGenerator()
stemmer = xapian.Stem("english")
indexer.set_stemmer(stemmer)
indexer.set_document(querydoc)
indexer.index_text(arbitrary_text)
rset = xapian.RSet()
rset.add_document(querydoc)
enquire = xapian.Enquire(database)
eset = enquire.get_eset(40, rset)

...then use the list of terms in eset to query for a set of matching documents.

Because RSet.add_document takes a docid, it seems I must add my
document to a database before I can include it in a relevance set. I
don't really want to add the arbtrary input text to my index, though.
Should I be going about this a different way?

Thanks,
Ryan



More information about the Xapian-discuss mailing list