[Xapian-discuss] How to accomplish this task with the Python Bindings?

robersonja robersonja at corp.earthlink.net
Fri Jan 13 22:06:22 GMT 2006


I am working on creating a OSX Spotlight like application.

first task is to index fully qualified paths, I want to be able to  
search for filenames first as a learning exercise to learn xapian and  
the python bindings.

I tried using Xapwrap by divmod.org, that didn't pan out, I could not  
get the actual data back after a search, a search would return  
document uid but I never code get .get_document().get_data() to  
return anything.

So I decided to just use the "raw" python bindings provided

so I tried the simpleindex and simplesearch python example programs.

I think in both cases ( xapwrap and just the default xapian )  
bindings I am getting indexing to happen, but I can't really tell  
because I can't get any search results to confirm anything.

When I tried with the xapian python bindings directly, I can't get  
the search to work. Granted the simplesearch example program is  
broken, so I am kind of groping in the dark on how to get the search  
to return a list of documents and have get_data() actually return  
something.

I guess what I need is some simple example code that will allow me to  
do the following..

given some data like

/this/is/a/fully/qualified/path/to/a/filename

how do I create a document and add it to an index so that I can  
search for it by 'filename'

this is what I am doing to create documents and add them to the index

#!/usr/bin/python
# indexer.py

import sys
import xapian

# setup the file to index
fileToIndex = sys.argv[1]
if len(sys.argv) >= 3:
     maxRecordsToIndex = int(sys.argv[2])
else:
     maxRecordsToIndex = 0
recordCount = -1

# setup the xapian database
try:
     db = xapian.WritableDatabase('/tmp/index',  
xapian.DB_CREATE_OR_OPEN)

     # index the file
     for line in file(fileToIndex):
         doc = xapian.Document()
         doc.set_data(line)
         db.add_document(doc)

         # my input file is 70GB of data, this is to make testing faster
         recordCount = recordCount + 1
         if maxRecordsToIndex > -1 and recordCount >= maxRecordsToIndex:
             break
         elif recordCount % 1000 == 0:
             print 'print processed %s records so far!' % recordCount
     print 'processed %s records' % recordCount

except Exception, e:
     print'Exception: %s' % str(e)
     sys.exit(1)


and this is what I an doing to try and get the data back from a  
search, the problem is I can't get it to find anything.

Given the example data above when run: python searcher.py /tmp/index  
filename
I get 0 records found!

#!/usr/local/bin/python
# searcher.py
import sys
import xapian

if len(sys.argv) < 3:
     print "usage: %s <path to database> <search terms>" % sys.argv[0]
     sys.exit(1)

try:
     database = xapian.Database(sys.argv[1])

     enquire = xapian.Enquire(database)
     query = xapian.Query(sys.argv[2])
     print "Performing query `%s'" % query.get_description()

     enquire.set_query(query)
     matches = enquire.get_mset(0, 10)

     print "%i results found" % matches.get_matches_estimated()
     for match in matches:
         print "ID %i %i%% [%s]" % (match[xapian.MSET_DID], match 
[xapian.MSET_PERCENT], match[xapian.MSET_DOCUMENT].get_data())

except Exception, e:
     print "Exception: %s" % str(e)
     sys.exit(1)



More information about the Xapian-discuss mailing list