[Xapian-devel] Word missing after stemmed with Norwegian in Search::Xapian::TermGenerator

Oat ABCTech oat at abctech-thailand.com
Mon Nov 26 05:26:40 GMT 2012


Hi all Xapian-devel,

Gist: https://gist.github.com/10d2222d8bffe8d7631d

I'm using Xapian-TermGenerator to extract Norwegian sentences to vsm
(vector space model) using TermGenerator. But when I test generating vsm
from 'Truet med å stevne misfornøyd PC-kunde - PC-leverandøren Asus likte
svært dårlig kundens misfornøyde leserbrev.' It doen't return 'asus' result
in vsm.

So I've tried to replace 'Asus' with other word such as Acer, Apple, Dell,
Fujitsu, HP, Lenovo,  LG, NEC, Samsung, Sony and Toshiba. Most brand words
I tried are able to get a result except Acer, Apple and Dell, but other
words which get its name as result aren't get 'dår'.

This problem may be caused by encoding which I'm investigating now. But it
would be great if you guys can help and if you guys have any question
regarding this problem please reply to me

Best regards,
Theerapat
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.xapian.org/pipermail/xapian-devel/attachments/20121126/3b5515c9/attachment.htm>


More information about the Xapian-devel mailing list