[Xapian-discuss] Stopword addition and stemming
Kevin Duraj
kevin.softdev at gmail.com
Wed Nov 17 02:11:10 GMT 2010
I performed search on my PacificAir.com search engine with 300 million
documents ...
searching for "Auckland Flights" from web site flightcentre.co.uk
http://pacificair.com/search?q=Auckland+Flights+site:flightcentre.co.uk
real 0m0.085s
user 0m0.040s
sys 0m0.040s
How much faster you want to run the search?
Try it yourself!
Kevin Duraj
http://pacificair.com/
On Mon, Nov 15, 2010 at 5:16 AM, goran kent <gorankent at gmail.com> wrote:
> On 11/15/10, Olly Betts <olly at survex.com> wrote:
>>> > Another option would be to treat '.' as a word character when between
>>> > two letters, and so tokenise bob.co.us as a single term, but that's not
>>> > supported by TermGenerator and QueryParser currently, so you'd have to
>>> > patch Xapian or tokenise documents and queries yourself.
>>>
>>> ug, beyond me, I'm afraid.
>>
>> Actually it's very simple to do - you just need to tweak check_infix() in
>> queryparser/queryparser.lemony and queryparser/termgenerator_internal.cc
>> by adding '.' to the first test.
>
> Hmm, interesting. I'm wondering how good an idea this would be for a
> general-usage search engine (specifically to prevent the
> phrase-search-time penalty for "co.us")? Shooting from the hip I
> think it's a great trade-off. I just *know* folks are going to search
> for [bob_co.us] and then wonder why the page is not responding
> promptly.
>
> Can you think of a downside to doing this?
>
> _______________________________________________
> Xapian-discuss mailing list
> Xapian-discuss at lists.xapian.org
> http://lists.xapian.org/mailman/listinfo/xapian-discuss
>
More information about the Xapian-discuss
mailing list