[Xapian-discuss] Ordering search results and defining a custom Weight class in python

Robert Kaye rob at eorbit.net
Mon Jun 2 20:19:58 BST 2008


On May 30, 2008, at 6:21 PM, Olly Betts wrote:
> Yes, BM25Weight has several parameters which can be adjusted to change
> the emphasis of the weighting.  If your documents are typically quite
> short, then you probably will get better results if you make the
> document length less important.

Awesome -- thanks for the excellent tip. With just a little tweaking  
the search results have improved drastically.

I've asked for some help testing our new search service and that has  
turned up that we're having problems properly tokenizing Chinese text.  
Our database can conceivably have text from all languages supported by  
Unicode and we'd need to find a way to properly tokenize chinese text.  
I've seen a few posts from last year talking about a Chinese  
tokenization scheme, but I haven't found anything about that in the  
official docs.

Is there a preferred way (in python) to handle the tokenization of  
Chinese characters?

Thanks for your help!

--

--ruaok      Somewhere in Texas a village is *still* missing its idiot.

Robert Kaye     --     rob at eorbit.net     --    http://mayhem-chaos.net





More information about the Xapian-discuss mailing list