Reparsing queries (was Re: [Xapian-devel] Re: [Xapian-commits] Changes in xapian/xapian-applications/queryserver/ xapian/xapian-applications/queryserver/source/)

Richard Boulton richard at tartarus.org
Thu May 13 22:12:29 BST 2004


Olly Betts wrote:
> I've now done this.  It's essentially the same as the queryserver patch,
> except that "@" is also stripped.

I'll pull the queryserver patch out again when I have time.

 > It seems arbitrary to leave "@" but
> to strip other phrase generators (especially "'" as then contractions
> such as "isn't" get broken up).  As it is now it works well on the
> sample of real world queries from tweakers.net, whereas not stripping
> *any* phrase generators seems to do slightly less well.
> 
> I think this is something to revisit after this is addressed:
> 
> http://www.xapian.org/cgi-bin/bugzilla/show_bug.cgi?id=22

Agreed.

The thinking behind my patch was that if the queryparser fails to parse 
the query entered, it's probably because the query is actually simply a 
piece of text pasted into the search box (or generated by some other 
application).  In this case, we might well have unmatched '"' 
characters, which cause the queryserver to fail.

What we actually want to do in this situation is probably to pass the 
query to a slightly different, more tolerant, parser.  In particular, we 
probably do still want to do phrase searches on things like "Olly's" and 
"e-mail", but we don't want to pay attention to double quotes, and we 
also don't want to exclude terms which are prefixed by a '-' (or require 
terms which are prefixed by a '+').  And we always want to try and 
generate some kind of useful query, rather than just an error message.

Stripping all phrase generators is a good quick fix, but I'm not sure it 
is the right solution.

(Caveat: I don't claim to fully understand all that the queryparser 
does, not having examined the code closely for a while.)




More information about the Xapian-devel mailing list