[Xapian-discuss] How to accomplish this task with the Python
Bindings?
robersonja
robersonja at corp.earthlink.net
Fri Jan 13 22:06:22 GMT 2006
I am working on creating a OSX Spotlight like application.
first task is to index fully qualified paths, I want to be able to
search for filenames first as a learning exercise to learn xapian and
the python bindings.
I tried using Xapwrap by divmod.org, that didn't pan out, I could not
get the actual data back after a search, a search would return
document uid but I never code get .get_document().get_data() to
return anything.
So I decided to just use the "raw" python bindings provided
so I tried the simpleindex and simplesearch python example programs.
I think in both cases ( xapwrap and just the default xapian )
bindings I am getting indexing to happen, but I can't really tell
because I can't get any search results to confirm anything.
When I tried with the xapian python bindings directly, I can't get
the search to work. Granted the simplesearch example program is
broken, so I am kind of groping in the dark on how to get the search
to return a list of documents and have get_data() actually return
something.
I guess what I need is some simple example code that will allow me to
do the following..
given some data like
/this/is/a/fully/qualified/path/to/a/filename
how do I create a document and add it to an index so that I can
search for it by 'filename'
this is what I am doing to create documents and add them to the index
#!/usr/bin/python
# indexer.py
import sys
import xapian
# setup the file to index
fileToIndex = sys.argv[1]
if len(sys.argv) >= 3:
maxRecordsToIndex = int(sys.argv[2])
else:
maxRecordsToIndex = 0
recordCount = -1
# setup the xapian database
try:
db = xapian.WritableDatabase('/tmp/index',
xapian.DB_CREATE_OR_OPEN)
# index the file
for line in file(fileToIndex):
doc = xapian.Document()
doc.set_data(line)
db.add_document(doc)
# my input file is 70GB of data, this is to make testing faster
recordCount = recordCount + 1
if maxRecordsToIndex > -1 and recordCount >= maxRecordsToIndex:
break
elif recordCount % 1000 == 0:
print 'print processed %s records so far!' % recordCount
print 'processed %s records' % recordCount
except Exception, e:
print'Exception: %s' % str(e)
sys.exit(1)
and this is what I an doing to try and get the data back from a
search, the problem is I can't get it to find anything.
Given the example data above when run: python searcher.py /tmp/index
filename
I get 0 records found!
#!/usr/local/bin/python
# searcher.py
import sys
import xapian
if len(sys.argv) < 3:
print "usage: %s <path to database> <search terms>" % sys.argv[0]
sys.exit(1)
try:
database = xapian.Database(sys.argv[1])
enquire = xapian.Enquire(database)
query = xapian.Query(sys.argv[2])
print "Performing query `%s'" % query.get_description()
enquire.set_query(query)
matches = enquire.get_mset(0, 10)
print "%i results found" % matches.get_matches_estimated()
for match in matches:
print "ID %i %i%% [%s]" % (match[xapian.MSET_DID], match
[xapian.MSET_PERCENT], match[xapian.MSET_DOCUMENT].get_data())
except Exception, e:
print "Exception: %s" % str(e)
sys.exit(1)
More information about the Xapian-discuss
mailing list