[Xapian-discuss] Prefixes

Olly Betts olly at survex.com
Tue Feb 6 06:42:27 GMT 2007


On Tue, Jan 30, 2007 at 12:36:39PM +0800, Fabrice Colin wrote:
> On 1/30/07, Olly Betts <olly at survex.com> wrote:
> >On 26/01/07, Fabrice Colin <fabrice.colin at gmail.com> wrote:
> >> I am using QueryParser::add_boolean_prefix("url", "U") to restrict 
> >> searches to documents that have a specific URL.
> >> When the input has a URL containing a space, how should it be quoted ?
> >
> >There isn't currently a way to quote such a prefixed boolean term, but
> >shouldn't spaces be quoted as %20 in a url anyway?
>
> Yes, for a URL, quoting makes sense, but for a file name filter, not so 
> much.
> For instance, entering something like 'file:"My CV.txt"' is not completely
> unreasonable.

Indeed (to be clear, I wasn't saying that it was ridiculous to want such
quoting, just that the example wasn't a good one - my reply was somewhat
terse as I was using a public computer in a hotel lobby!)

> Actually, this would be useful for searching indexes built by omindex.
> As far as I can tell it doesn't escape U-prefixed terms [...]

So it doesn't - that's a bug in omindex then!

> >> Would it be possible to have something like the following ?
> >>
> >> void Xapian::QueryParser::add_boolean_prefix(
> >> const std::string &field,
> >> const std::string &prefix,
> >> const TermTransformer *transform);
> >
> >Perhaps, though for this case it seems unlikely that a user would
> >really type in a 240+ character URL...
>
> A sane user would copy the URL from somewhere else, perhaps a
> notes taking program or a browser window, and paste it into the input field 
> :-)

True!

> Generally speaking, this would remove the need to pre-process user input or
> post-process the Query for fields that need some kind of transformation.

Indeed - I can certainly think of a few uses.  Some thought is needed
about the API of the "TermTransformer" though.  For example, it perhaps
needs to be able to say what characters it wants to allow in a term as
well as potentially transforming it afterwards (currently boolean terms
can rather arbitrarily contain any character apart from whitespace and
')').

Alternatively, this could be done by subclassing of QueryParser and
implementing a virtual function.

Cheers,
    Olly



More information about the Xapian-discuss mailing list