[Xapian-discuss] Getting documents "like" arbitrary text?
Ryan Shaw
ryanshaw at ischool.berkeley.edu
Sun Jun 29 15:56:16 BST 2008
Hello,
I am try to build a service that takes as input arbitrary text, treats
it as a document and returns a list of similar documents from my
index. I'm using the Xapian Python bindings.
My thought was to do something like this:
database = xapian.Database('/opt/index')
querydoc = xapian.Document()
querydoc.set_data(u'test')
indexer = xapian.TermGenerator()
stemmer = xapian.Stem("english")
indexer.set_stemmer(stemmer)
indexer.set_document(querydoc)
indexer.index_text(arbitrary_text)
rset = xapian.RSet()
rset.add_document(querydoc)
enquire = xapian.Enquire(database)
eset = enquire.get_eset(40, rset)
...then use the list of terms in eset to query for a set of matching documents.
Because RSet.add_document takes a docid, it seems I must add my
document to a database before I can include it in a relevance set. I
don't really want to add the arbtrary input text to my index, though.
Should I be going about this a different way?
Thanks,
Ryan
More information about the Xapian-discuss
mailing list