[Xapian-discuss] Python Binding - match[xapian.MSET_DOCUMENT].get_data() doesn't return anything!

jarrod roberson jarrod.roberson at gmail.com
Thu Jan 19 18:44:20 GMT 2006


I am working on creating a OSX Spotlight like application.

first task is to index fully qualified paths, I want to be able to search
for filenames first as a learning exercise to learn xapian and the python
bindings.

I tried using Xapwrap by divmod.org, that didn't pan out, I could not get
the actual data back after a search, a search would return document uid but
I never code get .get_document().get_data() to return anything.

So I decided to just use the "raw" python bindings provided

so I tried the simpleindex and simplesearch python example programs.

I think in both cases ( xapwrap and just the default xapian ) bindings I am
getting indexing to happen, but I can't really tell because I can't get any
search results to confirm anything.

When I tried with the xapian python bindings directly, I can't get the
search to work. Granted the simplesearch example program is broken, so I am
kind of groping in the dark on how to get the search to return a list of
documents and have get_data() actually return something.

I guess what I need is some simple example code that will allow me to do the
following..

given some data like

/this/is/a/fully/qualified/path/to/a/filename

how do I create a document and add it to an index so that I can search for
it by 'filename'

this is what I am doing to create documents and add them to the index

#!/usr/bin/python
# indexer.py

import sys
import xapian

# setup the file to index
fileToIndex = sys.argv[1]
if len(sys.argv) >= 3:
    maxRecordsToIndex = int(sys.argv[2])
else:
    maxRecordsToIndex = 0
recordCount = -1

# setup the xapian database
try:
    db = xapian.WritableDatabase('/tmp/index', xapian.DB_CREATE_OR_OPEN)

    # index the file
    for line in file(fileToIndex):
        doc = xapian.Document()
        doc.set_data(line)
        db.add_document(doc)

        # my input file is 70GB of data, this is to make testing faster
        recordCount = recordCount + 1
        if maxRecordsToIndex > -1 and recordCount >= maxRecordsToIndex:
            break
        elif recordCount % 1000 == 0:
            print 'print processed %s records so far!' % recordCount
    print 'processed %s records' % recordCount

except Exception, e:
    print'Exception: %s' % str(e)
    sys.exit(1)


and this is what I an doing to try and get the data back from a search, the
problem is I can't get it to find anything.

Given the example data above when run: python searcher.py /tmp/index
filename
I get 0 records found!

#!/usr/local/bin/python
# searcher.py
import sys
import xapian

if len(sys.argv) < 3:
    print "usage: %s <path to database> <search terms>" % sys.argv[0]
    sys.exit(1)

try:
    database = xapian.Database(sys.argv[1])

    enquire = xapian.Enquire(database)
    query = xapian.Query(sys.argv[2])
    print "Performing query `%s'" % query.get_description()

    enquire.set_query(query)
    matches = enquire.get_mset(0, 10)

    print "%i results found" % matches.get_matches_estimated()
    for match in matches:
        print "ID %i %i%% [%s]" % (match[xapian.MSET_DID], match[
xapian.MSET_PERCENT], match[xapian.MSET_DOCUMENT].get_data())

except Exception, e:
    print "Exception: %s" % str(e)
    sys.exit(1)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.tartarus.org/pipermail/xapian-discuss/attachments/20060119/e3fb7b91/attachment.htm


More information about the Xapian-discuss mailing list