[Xapian-discuss] Stopword addition and stemming
goran kent
gorankent at gmail.com
Mon Nov 15 13:16:43 GMT 2010
On 11/15/10, Olly Betts <olly at survex.com> wrote:
>> > Another option would be to treat '.' as a word character when between
>> > two letters, and so tokenise bob.co.us as a single term, but that's not
>> > supported by TermGenerator and QueryParser currently, so you'd have to
>> > patch Xapian or tokenise documents and queries yourself.
>>
>> ug, beyond me, I'm afraid.
>
> Actually it's very simple to do - you just need to tweak check_infix() in
> queryparser/queryparser.lemony and queryparser/termgenerator_internal.cc
> by adding '.' to the first test.
Hmm, interesting. I'm wondering how good an idea this would be for a
general-usage search engine (specifically to prevent the
phrase-search-time penalty for "co.us")? Shooting from the hip I
think it's a great trade-off. I just *know* folks are going to search
for [bob_co.us] and then wonder why the page is not responding
promptly.
Can you think of a downside to doing this?
More information about the Xapian-discuss
mailing list