[Xapian-discuss] How to accomplish this task with the Python Bindings?

Olly Betts olly at survex.com
Sat Jan 14 01:24:20 GMT 2006


On Fri, Jan 13, 2006 at 05:06:22PM -0500, robersonja wrote:
> I think in both cases ( xapwrap and just the default xapian )
> bindings I am getting indexing to happen, but I can't really tell
> because I can't get any search results to confirm anything.

For this sort of debugging, the "delve" utility is very handy.
It's in the examples subdirectory of xapian-core, and should
be installed by make install.  You can use it to check that
the index contains what you expect and narrow down a problem
to being on the indexing or searching side.

For example, you can look at the terms indexing document 7:

delve /path/to/database -r 7

Or the posting list for term "wibble" (note that delve wants the
term exactly as in the database, which may be stemmed or have
a prefix, etc):

delve /path/to/database -t wibble

For other options, read "delve --help".

> When I tried with the xapian python bindings directly, I can't get  
> the search to work. Granted the simplesearch example program is  
> broken, so I am kind of groping in the dark on how to get the search  
> to return a list of documents and have get_data() actually return  
> something.

Sorry about the broken simplesearch.py, but it's only broken in that
it uses a query constructor which SWIG was failing to wrap as intended.
The rest of the code is correct, only the part which builds the query
is wrong (well, arguably the example is right, and the bindings are
wrong).

This is fixed in SVN trunk, so you might want to try a snapshot:

http://www.oligarchy.co.uk/xapian/trunk/

They're in good shape right now as I'm busy tying up loose ends for
the next release.

>     # index the file
>     for line in file(fileToIndex):
>         doc = xapian.Document()
>         doc.set_data(line)

You need to add some index entries here which you want searches for this
document to match.  So split line on "/" and add a posting for each
entry:

        pos = 0
        for term in line.split("/"):
	    doc.add_posting(term, pos++)

>         db.add_document(doc)

You may want to stem "term" before adding it (if you do, you also need
to correspondingly stem terms before searching for them).

> and this is what I an doing to try and get the data back from a
> search, the problem is I can't get it to find anything.

The search script looks plausible.  I think if you actually add some
postings it'll all start to work.

Cheers,
    Olly



More information about the Xapian-discuss mailing list