Reparsing queries (was Re: [Xapian-devel] Re: [Xapian-commits] Changes in xapian/xapian-applications/queryserver/ xapian/xapian-applications/queryserver/source/)
Richard Boulton
richard at tartarus.org
Thu May 13 22:12:29 BST 2004
Olly Betts wrote:
> I've now done this. It's essentially the same as the queryserver patch,
> except that "@" is also stripped.
I'll pull the queryserver patch out again when I have time.
> It seems arbitrary to leave "@" but
> to strip other phrase generators (especially "'" as then contractions
> such as "isn't" get broken up). As it is now it works well on the
> sample of real world queries from tweakers.net, whereas not stripping
> *any* phrase generators seems to do slightly less well.
>
> I think this is something to revisit after this is addressed:
>
> http://www.xapian.org/cgi-bin/bugzilla/show_bug.cgi?id=22
Agreed.
The thinking behind my patch was that if the queryparser fails to parse
the query entered, it's probably because the query is actually simply a
piece of text pasted into the search box (or generated by some other
application). In this case, we might well have unmatched '"'
characters, which cause the queryserver to fail.
What we actually want to do in this situation is probably to pass the
query to a slightly different, more tolerant, parser. In particular, we
probably do still want to do phrase searches on things like "Olly's" and
"e-mail", but we don't want to pay attention to double quotes, and we
also don't want to exclude terms which are prefixed by a '-' (or require
terms which are prefixed by a '+'). And we always want to try and
generate some kind of useful query, rather than just an error message.
Stripping all phrase generators is a good quick fix, but I'm not sure it
is the right solution.
(Caveat: I don't claim to fully understand all that the queryparser
does, not having examined the code closely for a while.)
More information about the Xapian-devel
mailing list