[Xapian-discuss] Xapian Terms vs. Document Partition.

James Aylett james-xapian at tartarus.org
Wed Jun 4 10:44:23 BST 2008


On Tue, Jun 03, 2008 at 04:23:31PM -0700, Kevin Duraj wrote:

> Another thing is that my crawlers brought to index lot of Asian web
> sites and because they use different characters they create the
> postlist of index terms really big.

Out of interest, do you have (or could you generate) a stat for how
many of these mark their languages correctly (either xml:lang in
XHTML, or lang in HTML4, or some other method - there's probably a
META one, but the first two are preferred)?

J

-- 
/--------------------------------------------------------------------------\
  James Aylett                                                  xapian.org
  james at tartarus.org                               uncertaintydivision.org



More information about the Xapian-discuss mailing list