[Xapian-discuss] Xapian Terms vs. Document Partition.
James Aylett
james-xapian at tartarus.org
Wed Jun 4 10:44:23 BST 2008
On Tue, Jun 03, 2008 at 04:23:31PM -0700, Kevin Duraj wrote:
> Another thing is that my crawlers brought to index lot of Asian web
> sites and because they use different characters they create the
> postlist of index terms really big.
Out of interest, do you have (or could you generate) a stat for how
many of these mark their languages correctly (either xml:lang in
XHTML, or lang in HTML4, or some other method - there's probably a
META one, but the first two are preferred)?
J
--
/--------------------------------------------------------------------------\
James Aylett xapian.org
james at tartarus.org uncertaintydivision.org
More information about the Xapian-discuss
mailing list