[Xapian-discuss] Stopword addition and stemming

Kevin Duraj kevin.softdev at gmail.com
Wed Nov 17 02:11:10 GMT 2010


I performed search on my PacificAir.com search engine with 300 million
documents ...
searching for "Auckland Flights" from web site flightcentre.co.uk

http://pacificair.com/search?q=Auckland+Flights+site:flightcentre.co.uk

real	0m0.085s
user	0m0.040s
sys	0m0.040s

How much faster you want to run the search?

Try it yourself!

Kevin Duraj
http://pacificair.com/



On Mon, Nov 15, 2010 at 5:16 AM, goran kent <gorankent at gmail.com> wrote:
> On 11/15/10, Olly Betts <olly at survex.com> wrote:
>>> > Another option would be to treat '.' as a word character when between
>>> > two letters, and so tokenise bob.co.us as a single term, but that's not
>>> > supported by TermGenerator and QueryParser currently, so you'd have to
>>> > patch Xapian or tokenise documents and queries yourself.
>>>
>>> ug, beyond me, I'm afraid.
>>
>> Actually it's very simple to do - you just need to tweak check_infix() in
>> queryparser/queryparser.lemony and queryparser/termgenerator_internal.cc
>> by adding '.' to the first test.
>
> Hmm, interesting.  I'm wondering how good an idea this would be for a
> general-usage search engine (specifically to prevent the
> phrase-search-time penalty for "co.us")?  Shooting from the hip I
> think it's a great trade-off.  I just *know* folks are going to search
> for [bob_co.us] and then wonder why the page is not responding
> promptly.
>
> Can you think of a downside to doing this?
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>



More information about the Xapian-discuss mailing list