[Xapian-discuss] Ordering search results and defining a custom Weight class in python
Robert Kaye
rob at eorbit.net
Mon Jun 2 20:19:58 BST 2008
On May 30, 2008, at 6:21 PM, Olly Betts wrote:
> Yes, BM25Weight has several parameters which can be adjusted to change
> the emphasis of the weighting. If your documents are typically quite
> short, then you probably will get better results if you make the
> document length less important.
Awesome -- thanks for the excellent tip. With just a little tweaking
the search results have improved drastically.
I've asked for some help testing our new search service and that has
turned up that we're having problems properly tokenizing Chinese text.
Our database can conceivably have text from all languages supported by
Unicode and we'd need to find a way to properly tokenize chinese text.
I've seen a few posts from last year talking about a Chinese
tokenization scheme, but I haven't found anything about that in the
official docs.
Is there a preferred way (in python) to handle the tokenization of
Chinese characters?
Thanks for your help!
--
--ruaok Somewhere in Texas a village is *still* missing its idiot.
Robert Kaye -- rob at eorbit.net -- http://mayhem-chaos.net
More information about the Xapian-discuss
mailing list