[Xapian-discuss] Stopword addition and stemming

goran kent gorankent at gmail.com
Mon Nov 15 13:16:43 GMT 2010


On 11/15/10, Olly Betts <olly at survex.com> wrote:
>> > Another option would be to treat '.' as a word character when between
>> > two letters, and so tokenise bob.co.us as a single term, but that's not
>> > supported by TermGenerator and QueryParser currently, so you'd have to
>> > patch Xapian or tokenise documents and queries yourself.
>>
>> ug, beyond me, I'm afraid.
>
> Actually it's very simple to do - you just need to tweak check_infix() in
> queryparser/queryparser.lemony and queryparser/termgenerator_internal.cc
> by adding '.' to the first test.

Hmm, interesting.  I'm wondering how good an idea this would be for a
general-usage search engine (specifically to prevent the
phrase-search-time penalty for "co.us")?  Shooting from the hip I
think it's a great trade-off.  I just *know* folks are going to search
for [bob_co.us] and then wonder why the page is not responding
promptly.

Can you think of a downside to doing this?



More information about the Xapian-discuss mailing list