[Xapian-discuss] Stopping wildcard expansion at some point

Adam Sjøgren asjo at koldfront.dk
Fri Mar 20 12:47:59 GMT 2009


On Thu, 19 Mar 2009 23:31:42 +0000, Olly wrote:

>> I may have a hundred thousand terms starting with MM, but only 20
>> starting with A, and it would be sad for me if the user couldn't search
>> for A*.

> That's probably extreme, but it's likely to be true for English text
> that e* might be undesirable while z* is fine.  I'm not sure if either
> is actually useful for English though.

Yes, it sounds less plausible for "plain text", but when you mix various
kinds of codes and identifiers in there, it can become a problem.

> On Fri, Mar 06, 2009 at 12:06:52PM +0100, Adam Sjøgren wrote:
>> Attached is a patch updated from the feedback (Xapian::termcount,
>> QueryParserError, error message) for further consideration.

> I'm still wondering what to do about this if we don't want to prevent
> ourselves being able to push the wildcard expanding into the database
> backends.  We could perhaps push this check with it, but then the
> rejection potentially happens rather late on.  Or the check stays and
> we end up counting the matches up front if this option is on.

That is a little over my head architecturally; I appreciate that it
isn't straightforward.

> Can you attach the patch to a ticket in trac for now, so that it doesn't
> get forgotten about?

Sure - I have created a ticket now: http://trac.xapian.org/ticket/350

>> I wasn't quite sure how, in the error message, to display the term
>> exactly as the user entered it, the closest I found was "unstemmed",
>> which hasn't got the '*'.

> Yeah, that's probably the best choice (and just append a "*" to it).

Ah, I forgot to do that; I will update the patch in trac.


   Thanks!

    Adam

-- 
 "We get our thursdays from a banana."                        Adam Sjøgren
                                                         asjo at koldfront.dk




More information about the Xapian-discuss mailing list