[Xapian-discuss] constructing phrase queries

Olly Betts olly at survex.com
Thu Sep 30 13:10:32 BST 2004


On Thu, Sep 30, 2004 at 11:54:12AM +0100, James Aylett wrote:
> On Thu, Sep 30, 2004 at 11:31:18AM +0100, Richard Boulton wrote:
> > I'm not sure that get_description() currently correctly escapes the
> > necessary characters (eg, ':').  Ensuring that such escaping happens
> > both ways is the only pitfall I can think of to this plan.

get_description() is intended as a debugging aid - both for user's code
(to check they've built the query they think they did) and for the library
(e.g. to check that the QueryParser did the right thing).  As such
get_description() should be human readable, and forcing an escaping
scheme onto it to disambiguate corner cases hinders that.

It might be useful to offer an API to convert queries to strings and
back, I don't think get_description() is the right starting point.
Especially as we already have proper query serialising code since it's
needed for the remote backend.  And I'm not convinced that building
a serialised query is the way people would create queries - rather
it's sometimes useful to be able to serialise queries for storage
on disk for later recall.

> Of course, since you'd be programmatically creating this, it would be
> better just to ensure that you could write your parser code to build
> the Query properly in whatever language you use. :-)

I agree.  I don't think that building a special format string then
converting it to a query is the natural interface.  You'll need to
take each term, escape it, join them together into a string, and
wrap them in the correct gubbins to make it a phrase.  What we really
need to do is sort out the python bindings so you can just do:

terms = [ 'to', 'be', 'or', 'not', 'to', 'be' ]
query = xapian.Query_from_list(xapian.Query.OP_PHRASE, terms)

Or for convenience, perhaps even just:

terms = [ 'to', 'be', 'or', 'not', 'to', 'be' ]
query = xapian.Query_phrase(terms)

That's surely a lot better than:

phrase = [ 'to', 'be', 'or', 'not', 'to', 'be' ]
n = len(phrase)
query_string = "("
for pos,term in enumerate(phrase):
    if pos != 0:
        query_string += " PHRASE %d " % n
    query_string += "%s:(pos=%d)" % (term, pos + 1)
query_string += ")"
query = xapian.Query_from_description(query_string)

Cheers,
    Olly



More information about the Xapian-discuss mailing list